[jira] [Updated] (HDFS-5042) Completed files lost after power failure

2017-05-22 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-5042:

Attachment: HDFS-5042-02.patch

Updated the patch to fix javadoc and checkstyle

> Completed files lost after power failure
> 
>
> Key: HDFS-5042
> URL: https://issues.apache.org/jira/browse/HDFS-5042
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5)
>Reporter: Dave Latham
>Priority: Critical
> Attachments: HDFS-5042-01.patch, HDFS-5042-02.patch, 
> HDFS-5042-branch-2-01.patch
>
>
> We suffered a cluster wide power failure after which HDFS lost data that it 
> had acknowledged as closed and complete.
> The client was HBase which compacted a set of HFiles into a new HFile, then 
> after closing the file successfully, deleted the previous versions of the 
> file.  The cluster then lost power, and when brought back up the newly 
> created file was marked CORRUPT.
> Based on reading the logs it looks like the replicas were created by the 
> DataNodes in the 'blocksBeingWritten' directory.  Then when the file was 
> closed they were moved to the 'current' directory.  After the power cycle 
> those replicas were again in the blocksBeingWritten directory of the 
> underlying file system (ext3).  When those DataNodes reported in to the 
> NameNode it deleted those replicas and lost the file.
> Some possible fixes could be having the DataNode fsync the directory(s) after 
> moving the block from blocksBeingWritten to current to ensure the rename is 
> durable or having the NameNode accept replicas from blocksBeingWritten under 
> certain circumstances.
> Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode):
> {noformat}
> RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: 
> Creating 
> file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  with permission=rwxrwxrwx
> NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.allocateBlock: 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c.
>  blk_1395839728632046111_357084589
> DN 2013-06-29 11:16:06,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block 
> blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: 
> /10.0.5.237:50010
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> DN 2013-06-29 11:16:11,385 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
> blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327
> DN 2013-06-29 11:16:11,385 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block 
> blk_1395839728632046111_357084589 terminating
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing 
> lease on  file 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  from client DFSClient_hb_rs_hs745,60020,1372470111932
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.completeFile: file 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  is closed by DFSClient_hb_rs_hs745,60020,1372470111932
> RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Renaming compacted file at 
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  to 
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c
> RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Completed major compaction of 7 file(s) in n of 
> users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into 
> 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m
> ---  CRASH, RESTART -
> NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: addStoredBlock request received for 
> blk_1395839728632046111_357084589 on 10.0.6.1:50010 size 21978112 but was 
> rejected: Reported as block being written but is a block of closed file.
> NN 2013-06-29 12:01:19,743 INFO 

[jira] [Updated] (HDFS-11695) [SPS]: Namenode failed to start while loading SPS xAttrs from the edits log.

2017-05-22 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-11695:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-10285
   Status: Resolved  (was: Patch Available)

Good work Surendra. I have just pushed this to trunk.

> [SPS]: Namenode failed to start while loading SPS xAttrs from the edits log.
> 
>
> Key: HDFS-11695
> URL: https://issues.apache.org/jira/browse/HDFS-11695
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Blocker
> Fix For: HDFS-10285
>
> Attachments: fsimage.xml, HDFS-11695-HDFS-10285.001.patch, 
> HDFS-11695-HDFS-10285.002.patch, HDFS-11695-HDFS-10285.003.patch, 
> HDFS-11695-HDFS-10285.004.patch, HDFS-11695-HDFS-10285.005.patch
>
>
> {noformat}
> 2017-04-23 13:27:51,971 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
> java.io.IOException: Cannot request to call satisfy storage policy on path 
> /ssl, as this file/dir was already called for satisfying storage policy.
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSatisfyStoragePolicy(FSDirAttrOp.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirXAttrOp.unprotectedSetXAttrs(FSDirXAttrOp.java:284)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:918)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:241)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:150)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11695) [SPS]: Namenode failed to start while loading SPS xAttrs from the edits log.

2017-05-22 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020652#comment-16020652
 ] 

Uma Maheswara Rao G commented on HDFS-11695:


+1 on the latest patch

> [SPS]: Namenode failed to start while loading SPS xAttrs from the edits log.
> 
>
> Key: HDFS-11695
> URL: https://issues.apache.org/jira/browse/HDFS-11695
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Blocker
> Attachments: fsimage.xml, HDFS-11695-HDFS-10285.001.patch, 
> HDFS-11695-HDFS-10285.002.patch, HDFS-11695-HDFS-10285.003.patch, 
> HDFS-11695-HDFS-10285.004.patch, HDFS-11695-HDFS-10285.005.patch
>
>
> {noformat}
> 2017-04-23 13:27:51,971 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
> java.io.IOException: Cannot request to call satisfy storage policy on path 
> /ssl, as this file/dir was already called for satisfying storage policy.
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSatisfyStoragePolicy(FSDirAttrOp.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirXAttrOp.unprotectedSetXAttrs(FSDirXAttrOp.java:284)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:918)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:241)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:150)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11708) positional read will fail if replicas moved to different DNs after stream is opened

2017-05-22 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020641#comment-16020641
 ] 

Vinayakumar B commented on HDFS-11708:
--

bq. Not sure I understand what you are trying to fix here. It looks like that 
chooseDataNode() already calls refreshLocatedBlock() in case when all other 
locations failed. So your patch adds another call to NameNode unconditionally 
on retry, probably not the best thing to do from performance viewpoint.
I think you would have confused by the name {{refreshLocatedBlock()}}, which 
actually DOES NOT fetch locations from namenode if already cached for specific 
position. It just picks specific LocatedBlock from already fetched 
LocatedBlocks. Actual Namenode call to refetch happens in {{openInfo(true)}} in 
{{chooseDatanode()}}.
The issue was, chosen LocatedBlock was old even though cached locations were 
updated internally during retry in {{chooseDatanode()}}, retry was continuing 
with old reference of LocatedBlock, which was created outside {{while}} loop. 
Now on retry, it chooses again LocatedBlock from new cached locations.

bq. It would be good if you could provide a unit test, that fails without your 
fix.
I have already included a test in patch 
{{TestPread#testPreadFailureWithChangedBlockLocations()}}.

> positional read will fail if replicas moved to different DNs after stream is 
> opened
> ---
>
> Key: HDFS-11708
> URL: https://issues.apache.org/jira/browse/HDFS-11708
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.3
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Critical
>  Labels: release-blocker
> Attachments: HDFS-11708-01.patch, HDFS-11708-02.patch, 
> HDFS-11708-03.patch, HDFS-11708-04.patch
>
>
> Scenario:
> 1. File was written to DN1, DN2 with RF=2
> 2. File stream opened to read and kept. Block Locations are [DN1,DN2]
> 3. One of the replica (DN2) moved to another datanode (DN3) due to datanode 
> dead/balancing/etc.
> 4. Latest block locations in NameNode will be DN1 and DN3 in the 'same order'
> 5. DN1 went down, but not yet detected as dead in NameNode.
> 6. Client start reading using positional read api "read(pos, buf[], offset, 
> length)"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11419) BlockPlacementPolicyDefault is choosing datanode in an inefficient way

2017-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020621#comment-16020621
 ] 

Hudson commented on HDFS-11419:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11768 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11768/])
HDFS-11419. Performance analysis of new DFSNetworkTopology#chooseRandom. (arp: 
rev d0f346af26293f0ac8d118f98628f5528c1d6811)
* (add) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/net/TestDFSNetworkTopologyPerformance.java


> BlockPlacementPolicyDefault is choosing datanode in an inefficient way
> --
>
> Key: HDFS-11419
> URL: https://issues.apache.org/jira/browse/HDFS-11419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>
> Currently in {{BlockPlacementPolicyDefault}}, {{chooseTarget}} will end up 
> calling into {{chooseRandom}}, which will first find a random datanode by 
> calling
> {code}DatanodeDescriptor chosenNode = chooseDataNode(scope, 
> excludedNodes);{code}, then it checks whether that returned datanode 
> satisfies storage type requirement
> {code}storage = chooseStorage4Block(
>   chosenNode, blocksize, results, entry.getKey());{code}
> If yes, {{numOfReplicas--;}}, otherwise, the node is added to excluded nodes, 
> and runs the loop again until {{numOfReplicas}} is down to 0.
> A problem here is that, storage type is not being considered only until after 
> a random node is already returned.  We've seen a case where a cluster has a 
> large number of datanodes, while only a few satisfy the storage type 
> condition. So, for the most part, this code blindly picks random datanodes 
> that do not satisfy the storage type requirement.
> To make matters worse, the way {{NetworkTopology#chooseRandom}} works is 
> that, given a set of excluded nodes, it first finds a random datanodes, then 
> if it is in excluded nodes set, try find another random nodes. So the more 
> excluded nodes there are, the more likely a random node will be in the 
> excluded set, in which case we basically wasted one iteration.
> Therefore, this JIRA proposes to augment/modify the relevant classes in a way 
> that datanodes can be found more efficiently. There are currently two 
> different high level solutions we are considering:
> 1. add some field to Node base types to describe the storage type info, and 
> when searching for a node, we take into account such field(s), and do not 
> return node that does not meet the storage type requirement.
> 2. change {{NetworkTopology}} class to be aware of storage types, e.g. for 
> one storage type, there is one tree subset that connects all the nodes with 
> that type. And one search happens on only one such subset. So unexpected 
> storage types are simply not in the search space. 
> Thanks [~szetszwo] for the offline discussion, and thanks [~linyiqun] for 
> pointing out a wrong statement (corrected now) in the description. Any 
> further comments are more than welcome.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11535) Performance analysis of new DFSNetworkTopology#chooseRandom

2017-05-22 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-11535:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha3
   Status: Resolved  (was: Patch Available)

Committed this to trunk. Thanks for the contribution [~vagarychen].

> Performance analysis of new DFSNetworkTopology#chooseRandom
> ---
>
> Key: HDFS-11535
> URL: https://issues.apache.org/jira/browse/HDFS-11535
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-11535.001.patch, HDFS-11535.002.patch, 
> HDFS-11535.003.patch, HDFS-11535.004.patch, PerfTest.pdf
>
>
> This JIRA is created to post the results of some performance experiments we 
> did.  For those who are interested, please the attached .pdf file for more 
> detail. The attached patch file includes the experiment code we ran. 
> The key insights we got from these tests is that: although *the new method 
> outperforms the current one in most cases*. There is still *one case where 
> the current one is better*. Which is when there is only one storage type in 
> the cluster, and we also always look for this storage type. In this case, it 
> is simply a waste of time to perform storage-type-based pruning, blindly 
> picking up a random node (current methods) would suffice.
> Therefore, based on the analysis, we propose to use a *combination of both 
> the old and the new methods*:
> say, we search for a node of type X, since now inner node all keep storage 
> type info, we can *just check root node to see if X is the only type it has*. 
> If yes, blindly picking a random leaf will work, so we simply call the old 
> method, otherwise we call the new method.
> There is still at least one missing piece in this performance test, which is 
> garbage collection. The new method does a few more object creation when doing 
> the search, which adds overhead to GC. I'm still thinking of any potential 
> optimization but this seems tricky, also I'm not sure whether this 
> optimization worth doing at all. Please feel free to leave any 
> comments/suggestions.
> Thanks [~arpitagarwal] and [~szetszwo] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11859) Ozone : separate blockLocationProtocol out of containerLocationProtocol

2017-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020596#comment-16020596
 ] 

Hadoop QA commented on HDFS-11859:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
35s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
37s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
38s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
27s{color} | {color:green} HDFS-7240 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
52s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client in HDFS-7240 
has 2 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m  
4s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in HDFS-7240 has 10 
extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
28s{color} | {color:green} HDFS-7240 passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
6s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
14s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 24s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
20s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}110m 57s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.cblock.TestBufferManager |
|   | hadoop.ozone.web.TestLocalOzoneVolumes |
|   | hadoop.ozone.scm.TestAllocateContainer |
|   | hadoop.ozone.web.TestDistributedOzoneVolumes |
|   | hadoop.ozone.ksm.TestKeySpaceManager |
|   | hadoop.cblock.TestLocalBlockCache |
|   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
|   | hadoop.ozone.scm.TestContainerSQLCli |
|   | hadoop.ozone.web.TestOzoneWebAccess |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 |
|   | 

[jira] [Commented] (HDFS-11153) RPC Client detect address changed should reconnect immediately

2017-05-22 Thread KWON BYUNGCHANG (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020593#comment-16020593
 ] 

KWON BYUNGCHANG commented on HDFS-11153:


This problem also occurs in ResourceManager. 
I think this issue seems to fit the hadoop common jira.


> RPC Client detect address changed should reconnect immediately
> --
>
> Key: HDFS-11153
> URL: https://issues.apache.org/jira/browse/HDFS-11153
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 3.0.0-alpha1
>Reporter: DENG FEI
> Attachments: HDFS-1153.001.patch, stupid.png
>
>
> HA mode,the _*"ipc.client.connect.max.retries.on.timeouts"*_ and 
> _*"ipc.client.connect.max.retries"*_ is set zero,but if met active NN's ip 
> changed,it will detect the change,but won't reconnect because exceed the max 
> retry times,after do 15 times failover and then throw connection or standby 
> exception.
> maybe if found the address is changed,should reconnect immediately no matter 
> the retry times limit.
> 
> log is below:
> {noformat}
> 2016-11-16 17:00:20,844 (WARN org.apache.hadoop.ipc.Client 510): Address 
> change detected. Old: *:9000 New: X:9000
> 2016-11-16 17:01:09,893 (WARN org.apache.hadoop.ipc.Client 510): Address 
> change detected. Old: *::9000 New: X:9000
> 2016-11-16 17:01:09,893 (WARN 
> org.apache.hadoop.io.retry.RetryInvocationHandler 118): Exception while 
> invoking class 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo.
>  Not retrying because failovers (15) exceeded maximum allowed (15)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11153) RPC Client detect address changed should reconnect immediately

2017-05-22 Thread DENG FEI (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DENG FEI updated HDFS-11153:

Attachment: stupid.png

> RPC Client detect address changed should reconnect immediately
> --
>
> Key: HDFS-11153
> URL: https://issues.apache.org/jira/browse/HDFS-11153
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 3.0.0-alpha1
>Reporter: DENG FEI
> Attachments: HDFS-1153.001.patch, stupid.png
>
>
> HA mode,the _*"ipc.client.connect.max.retries.on.timeouts"*_ and 
> _*"ipc.client.connect.max.retries"*_ is set zero,but if met active NN's ip 
> changed,it will detect the change,but won't reconnect because exceed the max 
> retry times,after do 15 times failover and then throw connection or standby 
> exception.
> maybe if found the address is changed,should reconnect immediately no matter 
> the retry times limit.
> 
> log is below:
> {noformat}
> 2016-11-16 17:00:20,844 (WARN org.apache.hadoop.ipc.Client 510): Address 
> change detected. Old: *:9000 New: X:9000
> 2016-11-16 17:01:09,893 (WARN org.apache.hadoop.ipc.Client 510): Address 
> change detected. Old: *::9000 New: X:9000
> 2016-11-16 17:01:09,893 (WARN 
> org.apache.hadoop.io.retry.RetryInvocationHandler 118): Exception while 
> invoking class 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo.
>  Not retrying because failovers (15) exceeded maximum allowed (15)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11153) RPC Client detect address changed should reconnect immediately

2017-05-22 Thread DENG FEI (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020587#comment-16020587
 ] 

DENG FEI edited comment on HDFS-11153 at 5/23/17 2:47 AM:
--

As default behavior,why found host is changed,but do nothing.it's 
unreasonable,and we can't release very corrected configuration for every 
client.  [~kihwal]


was (Author: deng fei):
As default behavior,why found host is changed,but do nothing.it's 
unreasonable,and we can't release very corrected configuration for every 
client.  [~Kihwal Lee]

> RPC Client detect address changed should reconnect immediately
> --
>
> Key: HDFS-11153
> URL: https://issues.apache.org/jira/browse/HDFS-11153
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 3.0.0-alpha1
>Reporter: DENG FEI
> Attachments: HDFS-1153.001.patch
>
>
> HA mode,the _*"ipc.client.connect.max.retries.on.timeouts"*_ and 
> _*"ipc.client.connect.max.retries"*_ is set zero,but if met active NN's ip 
> changed,it will detect the change,but won't reconnect because exceed the max 
> retry times,after do 15 times failover and then throw connection or standby 
> exception.
> maybe if found the address is changed,should reconnect immediately no matter 
> the retry times limit.
> 
> log is below:
> {noformat}
> 2016-11-16 17:00:20,844 (WARN org.apache.hadoop.ipc.Client 510): Address 
> change detected. Old: *:9000 New: X:9000
> 2016-11-16 17:01:09,893 (WARN org.apache.hadoop.ipc.Client 510): Address 
> change detected. Old: *::9000 New: X:9000
> 2016-11-16 17:01:09,893 (WARN 
> org.apache.hadoop.io.retry.RetryInvocationHandler 118): Exception while 
> invoking class 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo.
>  Not retrying because failovers (15) exceeded maximum allowed (15)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11153) RPC Client detect address changed should reconnect immediately

2017-05-22 Thread DENG FEI (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020587#comment-16020587
 ] 

DENG FEI commented on HDFS-11153:
-

As default behavior,why found host is changed,but do nothing.it's 
unreasonable,and we can't release very corrected configuration for every 
client.  [~Kihwal Lee]

> RPC Client detect address changed should reconnect immediately
> --
>
> Key: HDFS-11153
> URL: https://issues.apache.org/jira/browse/HDFS-11153
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 3.0.0-alpha1
>Reporter: DENG FEI
> Attachments: HDFS-1153.001.patch
>
>
> HA mode,the _*"ipc.client.connect.max.retries.on.timeouts"*_ and 
> _*"ipc.client.connect.max.retries"*_ is set zero,but if met active NN's ip 
> changed,it will detect the change,but won't reconnect because exceed the max 
> retry times,after do 15 times failover and then throw connection or standby 
> exception.
> maybe if found the address is changed,should reconnect immediately no matter 
> the retry times limit.
> 
> log is below:
> {noformat}
> 2016-11-16 17:00:20,844 (WARN org.apache.hadoop.ipc.Client 510): Address 
> change detected. Old: *:9000 New: X:9000
> 2016-11-16 17:01:09,893 (WARN org.apache.hadoop.ipc.Client 510): Address 
> change detected. Old: *::9000 New: X:9000
> 2016-11-16 17:01:09,893 (WARN 
> org.apache.hadoop.io.retry.RetryInvocationHandler 118): Exception while 
> invoking class 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo.
>  Not retrying because failovers (15) exceeded maximum allowed (15)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema

2017-05-22 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020574#comment-16020574
 ] 

Andrew Wang commented on HDFS-7337:
---

Hi folks, thanks for the discussion,

bq. Is there a way to choose a system-wide default codec? So that after the 
cluster being initialized, users and admins can just specify a zone / directory 
to be "erasure coded", instead of choosing from several different codes, and 
each one has its own trade-offs, which require user / admin to understand?

We had a system default policy originally, but then moved away from it. I'm 
open to bringing it back if we believe that there's typically only one policy 
in a cluster. I think this is likely true.

bq. My concern is that, even if the admin is able to add policy via the API 
dynamically, it still requires the admin to reboot NN, or ssh into NN / change 
conf files and reload NN confs, to enable the policy? It makes the workflow 
complicated.

Yea, this is true. I can envision how this would work with just CLI commands: 
add/remove/enable/disable. I don't know how we'd do this with just config, 
since we want the safety of persisting things in the fsimage.

So, shall we do it all via API?

> Configurable and pluggable Erasure Codec and schema
> ---
>
> Key: HDFS-7337
> URL: https://issues.apache.org/jira/browse/HDFS-7337
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Zhe Zhang
>Priority: Critical
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-7337-prototype-v1.patch, 
> HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, 
> PluggableErasureCodec.pdf, PluggableErasureCodec-v2.pdf, 
> PluggableErasureCodec-v3.pdf, PluggableErasureCodec v4.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple 
> Erasure Codecs via pluggable approach. It allows to define and configure 
> multiple codec schemas with different coding algorithms and parameters. The 
> resultant codec schemas can be utilized and specified via command tool for 
> different file folders. While design and implement such pluggable framework, 
> it’s also to implement a concrete codec by default (Reed Solomon) to prove 
> the framework is useful and workable. Separate JIRA could be opened for the 
> RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation 
> to make concrete vendor libraries transparent to the upper layer. This JIRA 
> focuses on high level stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11377) Balancer hung due to no available mover threads

2017-05-22 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020566#comment-16020566
 ] 

Brahma Reddy Battula commented on HDFS-11377:
-

[~shv] can you please update the CHANGES.txt in {{branch-2.7}}?

> Balancer hung due to no available mover threads
> ---
>
> Key: HDFS-11377
> URL: https://issues.apache.org/jira/browse/HDFS-11377
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.7.3
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha3, 2.8.2
>
> Attachments: HDFS-11377.001.patch, HDFS-11377.002.patch
>
>
> When running balancer on large cluster which have more than 3000 Datanodes, 
> it might be hung due to "No mover threads available".
> The stack trace shows it waiting forever like below.
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7ff6cc014800 nid=0x6b2c waiting on 
> condition [0x7ff6d1bad000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905)
> {code}
> In the log, there are lots of WARN about "No mover threads available".
> {quote}
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_13700554102_1112815018180 with size=268435456 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_4009558842_1103118359883 with size=268435456 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_13881956058_1112996460026 with size=133509566 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.36:50010
> {quote}
> What happened here is, when there are no mover threads available, 
> DDatanode.isPendingQEmpty() will return false, so Balancer hung.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11851) getGlobalJNIEnv() may deadlock if exception is thrown

2017-05-22 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-11851:
---
Target Version/s: 3.0.0-alpha3

> getGlobalJNIEnv() may deadlock if exception is thrown
> -
>
> Key: HDFS-11851
> URL: https://issues.apache.org/jira/browse/HDFS-11851
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 3.0.0-alpha3
>Reporter: Henry Robinson
>Assignee: Sailesh Mukil
>Priority: Blocker
> Attachments: HDFS-11851.000.patch, HDFS-11851.001.patch
>
>
> HDFS-11529 introduced a deadlock into {{getGlobalJNIEnv()}} if an exception 
> is thrown. {{getGlobalJNIEnv()}} holds {{jvmMutex}}, but 
> {{printExceptionAndFree()}} will eventually try to acquire that lock in 
> {{setTLSExceptionStrings()}}.
> The exception might get caught from {{loadFileSystems}}:
> {code}
> jthr = invokeMethod(env, NULL, STATIC, NULL,
>  "org/apache/hadoop/fs/FileSystem",
>  "loadFileSystems", "()V");
> if (jthr) {
> printExceptionAndFree(env, jthr, PRINT_EXC_ALL, 
> "loadFileSystems");
> }
> }
> {code}
> and here's the relevant parts of the stack trace from where I call this API 
> in Impala, which uses {{libhdfs}}:
> {code}
> #0  __lll_lock_wait () at 
> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1  0x74a8d657 in _L_lock_909 () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
> #2  0x74a8d480 in __GI___pthread_mutex_lock (mutex=0x47ce960 
> ) at ../nptl/pthread_mutex_lock.c:79
> #3  0x02f06056 in mutexLock (m=) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/os/posix/mutexes.c:28
> #4  0x02efe817 in setTLSExceptionStrings (rootCause=0x0, 
> stackTrace=0x0) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:581
> #5  0x02f065d7 in printExceptionAndFreeV (env=0x513c1e8, 
> exc=0x508a8c0, noPrintFlags=, fmt=0x34349cf "loadFileSystems", 
> ap=0x7fffb660)
> at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:183
> #6  0x02f0683d in printExceptionAndFree (env=, 
> exc=, noPrintFlags=, fmt=)
> at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:213
> #7  0x02eff60f in getGlobalJNIEnv () at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:463
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11808) Backport HDFS-8549 to branch-2.7: Abort the balancer if an upgrade is in progress

2017-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020550#comment-16020550
 ] 

Hadoop QA commented on HDFS-11808:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
20s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
58s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
43s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 20s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 58 unchanged - 3 fixed = 59 total (was 61) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1289 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m 
32s{color} | {color:red} The patch 70 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 48m 33s{color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_131. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}128m 21s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_131 Failed junit tests | 
hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
| JDK v1.7.0_131 Failed junit tests | 

[jira] [Commented] (HDFS-11837) Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in batches

2017-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020549#comment-16020549
 ] 

Hadoop QA commented on HDFS-11837:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
51s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
51s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
43s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 33s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 3 new + 1143 unchanged - 5 fixed = 1146 total (was 1148) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1329 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m 
26s{color} | {color:red} The patch 73 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m  4s{color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_131. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}123m 21s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_131 Failed junit tests | 
hadoop.hdfs.server.datanode.TestBlockScanner |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.namenode.TestCheckpoint |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshot |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
|   | hadoop.hdfs.server.namenode.TestCacheDirectives |
| JDK v1.7.0_131 Failed junit tests | 

[jira] [Commented] (HDFS-11078) NPE in LazyPersistFileScrubber

2017-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020551#comment-16020551
 ] 

Hadoop QA commented on HDFS-11078:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
53s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}102m 31s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}135m 19s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestSpaceReservation |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11078 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12836383/HDFS-11078.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 186f06bf25a1 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 
16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 8e0f83e |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19548/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19548/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 

[jira] [Commented] (HDFS-11708) positional read will fail if replicas moved to different DNs after stream is opened

2017-05-22 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020545#comment-16020545
 ] 

Konstantin Shvachko commented on HDFS-11708:


Not sure I understand what you are trying to fix here. It looks like that 
{{chooseDataNode()}} already calls {{refreshLocatedBlock()}} in case when all 
other locations failed. So your patch adds another call to NameNode 
unconditionally on retry, probably not the best thing to do from performance 
viewpoint.
It would be good if you could provide a unit test, that fails without your fix.

> positional read will fail if replicas moved to different DNs after stream is 
> opened
> ---
>
> Key: HDFS-11708
> URL: https://issues.apache.org/jira/browse/HDFS-11708
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.3
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Critical
>  Labels: release-blocker
> Attachments: HDFS-11708-01.patch, HDFS-11708-02.patch, 
> HDFS-11708-03.patch, HDFS-11708-04.patch
>
>
> Scenario:
> 1. File was written to DN1, DN2 with RF=2
> 2. File stream opened to read and kept. Block Locations are [DN1,DN2]
> 3. One of the replica (DN2) moved to another datanode (DN3) due to datanode 
> dead/balancing/etc.
> 4. Latest block locations in NameNode will be DN1 and DN3 in the 'same order'
> 5. DN1 went down, but not yet detected as dead in NameNode.
> 6. Client start reading using positional read api "read(pos, buf[], offset, 
> length)"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11866) JournalNode Sync should be off by default in hdfs-default.xml

2017-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020542#comment-16020542
 ] 

Hudson commented on HDFS-11866:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11767 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11767/])
HDFS-11866. JournalNode Sync should be off by default in (arp: rev 
ca6bcc3c76babb2f7def1fd413d0917783224110)
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml


> JournalNode Sync should be off by default in hdfs-default.xml
> -
>
> Key: HDFS-11866
> URL: https://issues.apache.org/jira/browse/HDFS-11866
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-11866.001.patch
>
>
> dfs.journalnode.enable.sync is set to true in hdfs-default.xml. It should be 
> set to false to disable the feature by default, as discussed in HDFS-4025.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11535) Performance analysis of new DFSNetworkTopology#chooseRandom

2017-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020540#comment-16020540
 ] 

Hadoop QA commented on HDFS-11535:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m  0s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}104m  4s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
|   | hadoop.hdfs.TestDataTransferKeepalive |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 |
| Timed out junit tests | 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11535 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12869365/HDFS-11535.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 87182550a512 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 8e0f83e |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19552/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19552/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19552/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Performance analysis of new DFSNetworkTopology#chooseRandom
> 

[jira] [Commented] (HDFS-11849) JournalNode startup failure exception should be logged in log file

2017-05-22 Thread Surendra Singh Lilhore (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020531#comment-16020531
 ] 

Surendra Singh Lilhore commented on HDFS-11849:
---

Thanks [~brahmareddy] for review and commit.


> JournalNode startup failure exception should be logged in log file
> --
>
> Key: HDFS-11849
> URL: https://issues.apache.org/jira/browse/HDFS-11849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node
>Affects Versions: 2.7.0
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha3, 2.8.1
>
> Attachments: HDFS-11849-001.patch, HDFS-11849-002.patch
>
>
> JournalNode failed to start because of kerberos login. 
> {noformat}
> Exception in thread "main" java.io.IOException: Login failure for 
> xxx/y...@.com from keytab dummy.keytab: 
> javax.security.auth.login.LoginException: host1
> at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:994)
> at 
> org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:281)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:153)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:132)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:318)
> {noformat}
> but this exception is not written in log file.
> {noformat}
> STARTUP_MSG:   java = 1.x.x
> /
> 2017-05-18 16:08:14,961 INFO 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode: registered UNIX signal 
> handlers for [TERM, HUP, INT]
> 2017-05-18 16:08:15,511 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: 
> loaded properties from hadoop-metrics2.properties
> 2017-05-18 16:08:15,660 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period 
> at 10 second(s).
> 2017-05-18 16:08:15,660 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: JournalNode metrics system 
> started
> 2017-05-18 16:08:16,429 INFO 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down JournalNode at w-x-y-z
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11383) String duplication in org.apache.hadoop.fs.BlockLocation

2017-05-22 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020515#comment-16020515
 ] 

Andrew Wang commented on HDFS-11383:


Hi Misha, could you provide some indicative comparison numbers for this change? 
I get that it's safe, but it'd be good to document the expected improvement in 
heap usage. This should be doable with a unit test if you can't revive your 
test cluster.

> String duplication in org.apache.hadoop.fs.BlockLocation
> 
>
> Key: HDFS-11383
> URL: https://issues.apache.org/jira/browse/HDFS-11383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HDFS-11383.01.patch
>
>
> I am working on Hive performance, investigating the problem of high memory 
> pressure when (a) a table consists of a high number (thousands) of partitions 
> and (b) multiple queries run against it concurrently. It turns out that a lot 
> of memory is wasted due to data duplication. One source of duplicate strings 
> is class org.apache.hadoop.fs.BlockLocation. Its fields such as storageIds, 
> topologyPaths, hosts, names, may collectively use up to 6% of memory in my 
> benchmark, causing (together with other problematic classes) a huge memory 
> spike. Of these 6% of memory taken by BlockLocation strings, more than 5% are 
> wasted due to duplication.
> I think we need to add calls to String.intern() in the BlockLocation 
> constructor, like:
> {code}
> this.hosts = internStringsInArray(hosts);
> ...
> private void internStringsInArray(String[] sar) {
>   for (int i = 0; i < sar.length; i++) {
> sar[i] = sar[i].intern();
>   }
> }
> {code}
> String.intern() performs very well starting from JDK 7. I've found some 
> articles explaining the progress that was made by the HotSpot JVM developers 
> in this area, verified that with benchmarks myself, and finally added quite a 
> bit of interning to one of the Cloudera products without any issues.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11817) A faulty node can cause a lease leak and NPE on accessing data

2017-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020510#comment-16020510
 ] 

Hadoop QA commented on HDFS-11817:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 36s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 7 new + 271 unchanged - 4 fixed = 278 total (was 275) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 11s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}100m 47s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 |
|   | hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11817 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12869359/HDFS-11817.v2.trunk.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux f5667cdf52bd 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 8e0f83e |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19549/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19549/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19549/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19549/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


[jira] [Updated] (HDFS-11377) Balancer hung due to no available mover threads

2017-05-22 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-11377:
---
Target Version/s: 2.9.0, 2.7.4, 3.0.0-alpha3  (was: 2.9.0, 3.0.0-alpha3)
   Fix Version/s: 2.8.2
  2.7.4

Merged this into branch-2.8 and branch-2.7. Changing fix version.

> Balancer hung due to no available mover threads
> ---
>
> Key: HDFS-11377
> URL: https://issues.apache.org/jira/browse/HDFS-11377
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.7.3
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha3, 2.8.2
>
> Attachments: HDFS-11377.001.patch, HDFS-11377.002.patch
>
>
> When running balancer on large cluster which have more than 3000 Datanodes, 
> it might be hung due to "No mover threads available".
> The stack trace shows it waiting forever like below.
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7ff6cc014800 nid=0x6b2c waiting on 
> condition [0x7ff6d1bad000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905)
> {code}
> In the log, there are lots of WARN about "No mover threads available".
> {quote}
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_13700554102_1112815018180 with size=268435456 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_4009558842_1103118359883 with size=268435456 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_13881956058_1112996460026 with size=133509566 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.36:50010
> {quote}
> What happened here is, when there are no mover threads available, 
> DDatanode.isPendingQEmpty() will return false, so Balancer hung.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11866) JournalNode Sync should be off by default in hdfs-default.xml

2017-05-22 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-11866:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha3
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks for catching and fixing this [~hanishakoneru].

> JournalNode Sync should be off by default in hdfs-default.xml
> -
>
> Key: HDFS-11866
> URL: https://issues.apache.org/jira/browse/HDFS-11866
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-11866.001.patch
>
>
> dfs.journalnode.enable.sync is set to true in hdfs-default.xml. It should be 
> set to false to disable the feature by default, as discussed in HDFS-4025.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11865) Ozone: Do not initialize Ratis cluster during datanode startup

2017-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020485#comment-16020485
 ] 

Hadoop QA commented on HDFS-11865:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
7s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
49s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
59s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
12s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
59s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
14s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
 6s{color} | {color:green} HDFS-7240 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
47s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client in HDFS-7240 
has 2 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m  
5s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in HDFS-7240 has 10 
extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
5s{color} | {color:green} HDFS-7240 passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 59s{color} | {color:orange} root: The patch generated 2 new + 1 unchanged - 
0 fixed = 3 total (was 1) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
13s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
29s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}115m 43s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}196m 59s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure090 |
|   | hadoop.ozone.container.common.impl.TestContainerPersistence |
|   | 

[jira] [Updated] (HDFS-11597) Ozone: Add Ratis management API

2017-05-22 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-11597:
---
Attachment: HDFS-11597-HDFS-7240.20170522.patch

HDFS-11597-HDFS-7240.20170522.patch: adds create and close APIs.

Note that the patch requires HDFS-11865.

> Ozone: Add Ratis management API
> ---
>
> Key: HDFS-11597
> URL: https://issues.apache.org/jira/browse/HDFS-11597
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: HDFS-11597-HDFS-7240.20170522.patch
>
>
> We need an API to manage raft clusters, e.g.
> - RaftClusterId createRaftCluster(MembershipConfiguration)
> - void closeRaftCluster(RaftClusterId)
> - MembershipConfiguration getMembers(RaftClusterId)
> - void changeMembership(RaftClusterId, newMembershipConfiguration)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11742) Improve balancer usability after HDFS-8818

2017-05-22 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020458#comment-16020458
 ] 

Konstantin Shvachko commented on HDFS-11742:


[~kihwal] do you have a graph showing Balancer performance 
(metric:copyblockoperationspersec) with your patch? For comparison with what 
you posted earlier.

> Improve balancer usability after HDFS-8818
> --
>
> Key: HDFS-11742
> URL: https://issues.apache.org/jira/browse/HDFS-11742
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
>  Labels: release-blocker
> Attachments: balancer2.8.png, HDFS-11742.branch-2.8.patch, 
> HDFS-11742.branch-2.patch, HDFS-11742.trunk.patch, HDFS-11742.v2.trunk.patch
>
>
> We ran 2.8 balancer with HDFS-8818 on a 280-node and a 2,400-node cluster. In 
> both cases, it would hang forever after two iterations. The two iterations 
> were also moving things at a significantly lower rate. The hang itself is 
> fixed by HDFS-11377, but the design limitation remains, so the balancer 
> throughput ends up actually lower.
> Instead of reverting HDFS-8188 as originally suggested, I am making a small 
> change to make it less error prone and more usable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11866) JournalNode Sync should be off by default in hdfs-default.xml

2017-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020454#comment-16020454
 ] 

Hadoop QA commented on HDFS-11866:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 10s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 88m 35s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.server.datanode.TestFsDatasetCache |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11866 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12869334/HDFS-11866.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  |
| uname | Linux cf48a0d26bc2 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 8e0f83e |
| Default Java | 1.8.0_131 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19547/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19547/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19547/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> JournalNode Sync should be off by default in hdfs-default.xml
> -
>
> Key: HDFS-11866
> URL: https://issues.apache.org/jira/browse/HDFS-11866
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: HDFS-11866.001.patch
>
>
> dfs.journalnode.enable.sync is set to true in hdfs-default.xml. It should be 
> set to false to disable the feature by default, as discussed in HDFS-4025.



--
This message was 

[jira] [Commented] (HDFS-10480) Add an admin command to list currently open files

2017-05-22 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020452#comment-16020452
 ] 

Andrew Wang commented on HDFS-10480:


Thanks for working on this Manoj. Looks good overall!

One high-level question first, what do we envision as the usecases for this 
command? I figured it was for:

# Debugging lease manager state
# Finding open files that are blocking decommission

To do the first, we probably shouldn't skip erroneous leases:

{code}
  if (!inodeFile.isUnderConstruction()) {
LOG.warn("The file " + inodeFile.getFullPathName()
+ " is not under construction but has lease.");
continue;
  }
{code}

The admin invoking the command also won't see this WARN since it goes to the NN 
log, not the client log. The log is still a bit useful, but there should be 
some non-NN-log way for admins to debug erroneous state here. I guess they can 
cross-check with fsck information?

For the second, the admin is wondering why some DN hasn't finished decomming 
yet, and wants to find the UC blocks and the client and path. It looks like 
HDFS-11847 will make this easy, without needing to resort to fsck. Nice.

But what's the workflow where we need HDFS-11848? This new command is much 
lighter weight than {{fsck -openforwrite}}, so I'd like to encourage users to 
use the new command instead. Just wondering, before we add some new 
functionality.

Some review comments:

* Maybe bump the NUM_RESPONSES limit to 1000, to match {{DFS_LIST_LIMIT}}?
* Should the precondition check for {{NUM_RESPONSES}} check for {{> 0}} rather 
than {{>= 0}} ? FWIW, {{0}} is also not a positive integer.
* Based on HDFS-9395, we should only generate an audit event when the op is 
successful, or fails due to an ACE. Notably, it should not log for things like 
an IOE.
* {{LeaseManager#getUnderConstructionFiles}} makes a new TreeMap out of 
{{leasesById}}. This is potentially a lot of garbage. Can we make 
{{leasesById}} a TreeMap instead to avoid this? TreeMaps still have pretty good 
performance.
* Can we also add an assert that the FSN read lock is held?

Testing:
* I like the step-up/step-down with the open and closed file sets. Could we 
take the verification one step further, and do it in a for-loop? This way we 
test all the way from {{0..numOpenFiles}} rather than just at {{numOpenFiles}} 
and {{numOpenFiles/2}}
* testListOpenFilesInHA, it'd be nice to see what happens when there's a 
failover between batches while iterating. I also suggest perhaps moving this 
into {{TestListOpenFiles}} since it doesn't really relate to append.
* Do we have any tests for the {{HdfsAdmin}} API? It'd be better to test 
against this than the one in {{DistributedFileSystem}}, since our end users 
will be programming against {{HdfsAdmin}}.

> Add an admin command to list currently open files
> -
>
> Key: HDFS-10480
> URL: https://issues.apache.org/jira/browse/HDFS-10480
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kihwal Lee
>Assignee: Manoj Govindassamy
> Attachments: HDFS-10480.02.patch, HDFS-10480.03.patch, 
> HDFS-10480.04.patch, HDFS-10480-trunk-1.patch, HDFS-10480-trunk.patch
>
>
> Currently there is no easy way to obtain the list of active leases or files 
> being written. It will be nice if we have an admin command to list open files 
> and their lease holders.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11859) Ozone : separate blockLocationProtocol out of containerLocationProtocol

2017-05-22 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-11859:
--
Attachment: HDFS-11859-HDFS-7240.006.patch

Fix the block service rpc port issue.

> Ozone : separate blockLocationProtocol out of containerLocationProtocol
> ---
>
> Key: HDFS-11859
> URL: https://issues.apache.org/jira/browse/HDFS-11859
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Chen Liang
>Assignee: Xiaoyu Yao
> Attachments: HDFS-11859-HDFS-7240.001.patch, 
> HDFS-11859-HDFS-7240.002.patch, HDFS-11859-HDFS-7240.003.patch, 
> HDFS-11859-HDFS-7240.004.patch, HDFS-11859-HDFS-7240.005.patch, 
> HDFS-11859-HDFS-7240.006.patch
>
>
> Currently StorageLocationProtcol contains two types of operations: container 
> related operations and block related operations. Although there is 
> {{ScmBlockLocationProtocol}} for block operations, only 
> {{StorageContainerLocationProtocolServerSideTranslatorPB}} is making the 
> distinguish. 
> This JIRA tries to make the separation complete and thorough for all places.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-7134) Replication count for a block should not update till the blocks have settled on Datanodes

2017-05-22 Thread gurmukh singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gurmukh singh updated HDFS-7134:

Component/s: datanode

> Replication count for a block should not update till the blocks have settled 
> on Datanodes
> -
>
> Key: HDFS-7134
> URL: https://issues.apache.org/jira/browse/HDFS-7134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 1.2.1, 2.6.0, 2.7.3
> Environment: Linux nn1.cluster1.com 2.6.32-431.20.3.el6.x86_64 #1 SMP 
> Thu Jun 19 21:14:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
> [hadoop@nn1 conf]$ cat /etc/redhat-release
> CentOS release 6.5 (Final)
>Reporter: gurmukh singh
>Priority: Critical
>  Labels: HDFS
>
> The count for the number of replica's for a block should not change till the 
> blocks have settled on the datanodes.
> Test Case:
> Hadoop Cluster with 1 namenode and 3 datanodes.
> nn1.cluster1.com(192.168.1.70)
> dn1.cluster1.com(192.168.1.72)
> dn2.cluster1.com(192.168.1.73)
> dn3.cluster1.com(192.168.1.74)
> Cluster up and running fine with replication set to "1" for parameter 
> "dfs.replication on all nodes"
> 
> dfs.replication
> 1
> 
> To reduce the wait time, have reduced the dfs.heartbeat and recheck 
> parameters.
> on datanode2 (192.168.1.72)
> [hadoop@dn2 ~]$ hadoop fs -Ddfs.replication=2 -put from_dn2 /
> [hadoop@dn2 ~]$ hadoop fs -ls /from_dn2
> Found 1 items
> -rw-r--r--   2 hadoop supergroup 17 2014-09-23 13:33 /from_dn2
> On Namenode
> ===
> As expected, copy was done from datanode2, one copy will go locally.
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 13:53:16 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=2 [192.168.1.74:50010, 
> 192.168.1.73:50010]
> Can see the blocks on the data nodes disks as well under the "current" 
> directory.
> Now, shutdown datanode2(192.168.1.73) and as expected block moves to another 
> datanode to maintain a replication of 2
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 13:54:21 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=2 [192.168.1.74:50010, 
> 192.168.1.72:50010]
> But, now if i bring back the datanode2, and although the namenode see that 
> this block is at 3 places now and fires a invalidate command for 
> datanode1(192.168.1.72) but the replication on the namenode is bumped to 3 
> immediately.
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 13:56:12 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=3 [192.168.1.74:50010, 
> 192.168.1.72:50010, 192.168.1.73:50010]
> on Datanode1 - The invalidate command has been fired immediately and the 
> block deleted.
> =
> 2014-09-23 13:54:17,483 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving blk_8132629811771280764_1175 src: /192.168.1.74:38099 dest: 
> /192.168.1.72:50010
> 2014-09-23 13:54:17,502 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Received blk_8132629811771280764_1175 src: /192.168.1.74:38099 dest: 
> /192.168.1.72:50010 size 17
> 2014-09-23 13:55:28,720 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Scheduling blk_8132629811771280764_1175 file 
> /space/disk1/current/blk_8132629811771280764 for deletion
> 2014-09-23 13:55:28,721 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Deleted blk_8132629811771280764_1175 at file 
> /space/disk1/current/blk_8132629811771280764
> The namenode still shows 3 replica's. even if one has been deleted, even 
> after more then 30 mins.
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 14:21:27 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=3 [192.168.1.74:50010, 
> 192.168.1.72:50010, 192.168.1.73:50010]
> This could be a dangerous, if someone remove or other 2 datanodes fail.
> On Datanode 1
> =
> Before, the datanode1 is brought back
> [hadoop@dn1 conf]$ ls -l /space/disk*/current
> /space/disk1/current:
> total 28
> -rw-rw-r-- 1 hadoop hadoop   13 Sep 21 09:09 blk_2278001646987517832
> -rw-rw-r-- 1 hadoop hadoop   11 Sep 21 09:09 blk_2278001646987517832_1171.meta
> -rw-rw-r-- 1 hadoop hadoop   17 Sep 23 13:54 blk_8132629811771280764
> -rw-rw-r-- 1 hadoop hadoop   11 Sep 23 13:54 blk_8132629811771280764_1175.meta
> -rw-rw-r-- 1 hadoop 

[jira] [Commented] (HDFS-7134) Replication count for a block should not update till the blocks have settled on Datanodes

2017-05-22 Thread gurmukh singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020441#comment-16020441
 ] 

gurmukh singh commented on HDFS-7134:
-

This behavior is still seen in hadoop 2.7.3. Can someone look into this.

> Replication count for a block should not update till the blocks have settled 
> on Datanodes
> -
>
> Key: HDFS-7134
> URL: https://issues.apache.org/jira/browse/HDFS-7134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 1.2.1, 2.6.0, 2.7.3
> Environment: Linux nn1.cluster1.com 2.6.32-431.20.3.el6.x86_64 #1 SMP 
> Thu Jun 19 21:14:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
> [hadoop@nn1 conf]$ cat /etc/redhat-release
> CentOS release 6.5 (Final)
>Reporter: gurmukh singh
>Priority: Critical
>  Labels: HDFS
>
> The count for the number of replica's for a block should not change till the 
> blocks have settled on the datanodes.
> Test Case:
> Hadoop Cluster with 1 namenode and 3 datanodes.
> nn1.cluster1.com(192.168.1.70)
> dn1.cluster1.com(192.168.1.72)
> dn2.cluster1.com(192.168.1.73)
> dn3.cluster1.com(192.168.1.74)
> Cluster up and running fine with replication set to "1" for parameter 
> "dfs.replication on all nodes"
> 
> dfs.replication
> 1
> 
> To reduce the wait time, have reduced the dfs.heartbeat and recheck 
> parameters.
> on datanode2 (192.168.1.72)
> [hadoop@dn2 ~]$ hadoop fs -Ddfs.replication=2 -put from_dn2 /
> [hadoop@dn2 ~]$ hadoop fs -ls /from_dn2
> Found 1 items
> -rw-r--r--   2 hadoop supergroup 17 2014-09-23 13:33 /from_dn2
> On Namenode
> ===
> As expected, copy was done from datanode2, one copy will go locally.
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 13:53:16 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=2 [192.168.1.74:50010, 
> 192.168.1.73:50010]
> Can see the blocks on the data nodes disks as well under the "current" 
> directory.
> Now, shutdown datanode2(192.168.1.73) and as expected block moves to another 
> datanode to maintain a replication of 2
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 13:54:21 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=2 [192.168.1.74:50010, 
> 192.168.1.72:50010]
> But, now if i bring back the datanode2, and although the namenode see that 
> this block is at 3 places now and fires a invalidate command for 
> datanode1(192.168.1.72) but the replication on the namenode is bumped to 3 
> immediately.
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 13:56:12 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=3 [192.168.1.74:50010, 
> 192.168.1.72:50010, 192.168.1.73:50010]
> on Datanode1 - The invalidate command has been fired immediately and the 
> block deleted.
> =
> 2014-09-23 13:54:17,483 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving blk_8132629811771280764_1175 src: /192.168.1.74:38099 dest: 
> /192.168.1.72:50010
> 2014-09-23 13:54:17,502 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Received blk_8132629811771280764_1175 src: /192.168.1.74:38099 dest: 
> /192.168.1.72:50010 size 17
> 2014-09-23 13:55:28,720 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Scheduling blk_8132629811771280764_1175 file 
> /space/disk1/current/blk_8132629811771280764 for deletion
> 2014-09-23 13:55:28,721 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Deleted blk_8132629811771280764_1175 at file 
> /space/disk1/current/blk_8132629811771280764
> The namenode still shows 3 replica's. even if one has been deleted, even 
> after more then 30 mins.
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 14:21:27 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=3 [192.168.1.74:50010, 
> 192.168.1.72:50010, 192.168.1.73:50010]
> This could be a dangerous, if someone remove or other 2 datanodes fail.
> On Datanode 1
> =
> Before, the datanode1 is brought back
> [hadoop@dn1 conf]$ ls -l /space/disk*/current
> /space/disk1/current:
> total 28
> -rw-rw-r-- 1 hadoop hadoop   13 Sep 21 09:09 blk_2278001646987517832
> -rw-rw-r-- 1 hadoop hadoop   11 Sep 21 09:09 blk_2278001646987517832_1171.meta
> -rw-rw-r-- 1 hadoop hadoop   17 Sep 23 13:54 blk_8132629811771280764
> -rw-rw-r-- 1 hadoop 

[jira] [Updated] (HDFS-7134) Replication count for a block should not update till the blocks have settled on Datanodes

2017-05-22 Thread gurmukh singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gurmukh singh updated HDFS-7134:

Labels: HDFS  (was: )

> Replication count for a block should not update till the blocks have settled 
> on Datanodes
> -
>
> Key: HDFS-7134
> URL: https://issues.apache.org/jira/browse/HDFS-7134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 1.2.1, 2.6.0, 2.7.3
> Environment: Linux nn1.cluster1.com 2.6.32-431.20.3.el6.x86_64 #1 SMP 
> Thu Jun 19 21:14:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
> [hadoop@nn1 conf]$ cat /etc/redhat-release
> CentOS release 6.5 (Final)
>Reporter: gurmukh singh
>Priority: Critical
>  Labels: HDFS
>
> The count for the number of replica's for a block should not change till the 
> blocks have settled on the datanodes.
> Test Case:
> Hadoop Cluster with 1 namenode and 3 datanodes.
> nn1.cluster1.com(192.168.1.70)
> dn1.cluster1.com(192.168.1.72)
> dn2.cluster1.com(192.168.1.73)
> dn3.cluster1.com(192.168.1.74)
> Cluster up and running fine with replication set to "1" for parameter 
> "dfs.replication on all nodes"
> 
> dfs.replication
> 1
> 
> To reduce the wait time, have reduced the dfs.heartbeat and recheck 
> parameters.
> on datanode2 (192.168.1.72)
> [hadoop@dn2 ~]$ hadoop fs -Ddfs.replication=2 -put from_dn2 /
> [hadoop@dn2 ~]$ hadoop fs -ls /from_dn2
> Found 1 items
> -rw-r--r--   2 hadoop supergroup 17 2014-09-23 13:33 /from_dn2
> On Namenode
> ===
> As expected, copy was done from datanode2, one copy will go locally.
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 13:53:16 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=2 [192.168.1.74:50010, 
> 192.168.1.73:50010]
> Can see the blocks on the data nodes disks as well under the "current" 
> directory.
> Now, shutdown datanode2(192.168.1.73) and as expected block moves to another 
> datanode to maintain a replication of 2
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 13:54:21 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=2 [192.168.1.74:50010, 
> 192.168.1.72:50010]
> But, now if i bring back the datanode2, and although the namenode see that 
> this block is at 3 places now and fires a invalidate command for 
> datanode1(192.168.1.72) but the replication on the namenode is bumped to 3 
> immediately.
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 13:56:12 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=3 [192.168.1.74:50010, 
> 192.168.1.72:50010, 192.168.1.73:50010]
> on Datanode1 - The invalidate command has been fired immediately and the 
> block deleted.
> =
> 2014-09-23 13:54:17,483 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving blk_8132629811771280764_1175 src: /192.168.1.74:38099 dest: 
> /192.168.1.72:50010
> 2014-09-23 13:54:17,502 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Received blk_8132629811771280764_1175 src: /192.168.1.74:38099 dest: 
> /192.168.1.72:50010 size 17
> 2014-09-23 13:55:28,720 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Scheduling blk_8132629811771280764_1175 file 
> /space/disk1/current/blk_8132629811771280764 for deletion
> 2014-09-23 13:55:28,721 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Deleted blk_8132629811771280764_1175 at file 
> /space/disk1/current/blk_8132629811771280764
> The namenode still shows 3 replica's. even if one has been deleted, even 
> after more then 30 mins.
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 14:21:27 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=3 [192.168.1.74:50010, 
> 192.168.1.72:50010, 192.168.1.73:50010]
> This could be a dangerous, if someone remove or other 2 datanodes fail.
> On Datanode 1
> =
> Before, the datanode1 is brought back
> [hadoop@dn1 conf]$ ls -l /space/disk*/current
> /space/disk1/current:
> total 28
> -rw-rw-r-- 1 hadoop hadoop   13 Sep 21 09:09 blk_2278001646987517832
> -rw-rw-r-- 1 hadoop hadoop   11 Sep 21 09:09 blk_2278001646987517832_1171.meta
> -rw-rw-r-- 1 hadoop hadoop   17 Sep 23 13:54 blk_8132629811771280764
> -rw-rw-r-- 1 hadoop hadoop   11 Sep 23 13:54 blk_8132629811771280764_1175.meta
> -rw-rw-r-- 1 hadoop hadoop 5299 

[jira] [Updated] (HDFS-7134) Replication count for a block should not update till the blocks have settled on Datanodes

2017-05-22 Thread gurmukh singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gurmukh singh updated HDFS-7134:

Component/s: hdfs

> Replication count for a block should not update till the blocks have settled 
> on Datanodes
> -
>
> Key: HDFS-7134
> URL: https://issues.apache.org/jira/browse/HDFS-7134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 1.2.1, 2.6.0, 2.7.3
> Environment: Linux nn1.cluster1.com 2.6.32-431.20.3.el6.x86_64 #1 SMP 
> Thu Jun 19 21:14:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
> [hadoop@nn1 conf]$ cat /etc/redhat-release
> CentOS release 6.5 (Final)
>Reporter: gurmukh singh
>Priority: Critical
>  Labels: HDFS
>
> The count for the number of replica's for a block should not change till the 
> blocks have settled on the datanodes.
> Test Case:
> Hadoop Cluster with 1 namenode and 3 datanodes.
> nn1.cluster1.com(192.168.1.70)
> dn1.cluster1.com(192.168.1.72)
> dn2.cluster1.com(192.168.1.73)
> dn3.cluster1.com(192.168.1.74)
> Cluster up and running fine with replication set to "1" for parameter 
> "dfs.replication on all nodes"
> 
> dfs.replication
> 1
> 
> To reduce the wait time, have reduced the dfs.heartbeat and recheck 
> parameters.
> on datanode2 (192.168.1.72)
> [hadoop@dn2 ~]$ hadoop fs -Ddfs.replication=2 -put from_dn2 /
> [hadoop@dn2 ~]$ hadoop fs -ls /from_dn2
> Found 1 items
> -rw-r--r--   2 hadoop supergroup 17 2014-09-23 13:33 /from_dn2
> On Namenode
> ===
> As expected, copy was done from datanode2, one copy will go locally.
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 13:53:16 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=2 [192.168.1.74:50010, 
> 192.168.1.73:50010]
> Can see the blocks on the data nodes disks as well under the "current" 
> directory.
> Now, shutdown datanode2(192.168.1.73) and as expected block moves to another 
> datanode to maintain a replication of 2
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 13:54:21 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=2 [192.168.1.74:50010, 
> 192.168.1.72:50010]
> But, now if i bring back the datanode2, and although the namenode see that 
> this block is at 3 places now and fires a invalidate command for 
> datanode1(192.168.1.72) but the replication on the namenode is bumped to 3 
> immediately.
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 13:56:12 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=3 [192.168.1.74:50010, 
> 192.168.1.72:50010, 192.168.1.73:50010]
> on Datanode1 - The invalidate command has been fired immediately and the 
> block deleted.
> =
> 2014-09-23 13:54:17,483 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving blk_8132629811771280764_1175 src: /192.168.1.74:38099 dest: 
> /192.168.1.72:50010
> 2014-09-23 13:54:17,502 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Received blk_8132629811771280764_1175 src: /192.168.1.74:38099 dest: 
> /192.168.1.72:50010 size 17
> 2014-09-23 13:55:28,720 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Scheduling blk_8132629811771280764_1175 file 
> /space/disk1/current/blk_8132629811771280764 for deletion
> 2014-09-23 13:55:28,721 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Deleted blk_8132629811771280764_1175 at file 
> /space/disk1/current/blk_8132629811771280764
> The namenode still shows 3 replica's. even if one has been deleted, even 
> after more then 30 mins.
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 14:21:27 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=3 [192.168.1.74:50010, 
> 192.168.1.72:50010, 192.168.1.73:50010]
> This could be a dangerous, if someone remove or other 2 datanodes fail.
> On Datanode 1
> =
> Before, the datanode1 is brought back
> [hadoop@dn1 conf]$ ls -l /space/disk*/current
> /space/disk1/current:
> total 28
> -rw-rw-r-- 1 hadoop hadoop   13 Sep 21 09:09 blk_2278001646987517832
> -rw-rw-r-- 1 hadoop hadoop   11 Sep 21 09:09 blk_2278001646987517832_1171.meta
> -rw-rw-r-- 1 hadoop hadoop   17 Sep 23 13:54 blk_8132629811771280764
> -rw-rw-r-- 1 hadoop hadoop   11 Sep 23 13:54 blk_8132629811771280764_1175.meta
> -rw-rw-r-- 1 hadoop hadoop 5299 Sep 

[jira] [Updated] (HDFS-7134) Replication count for a block should not update till the blocks have settled on Datanodes

2017-05-22 Thread gurmukh singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gurmukh singh updated HDFS-7134:

Affects Version/s: 2.7.3

> Replication count for a block should not update till the blocks have settled 
> on Datanodes
> -
>
> Key: HDFS-7134
> URL: https://issues.apache.org/jira/browse/HDFS-7134
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.6.0, 2.7.3
> Environment: Linux nn1.cluster1.com 2.6.32-431.20.3.el6.x86_64 #1 SMP 
> Thu Jun 19 21:14:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
> [hadoop@nn1 conf]$ cat /etc/redhat-release
> CentOS release 6.5 (Final)
>Reporter: gurmukh singh
>Priority: Critical
>
> The count for the number of replica's for a block should not change till the 
> blocks have settled on the datanodes.
> Test Case:
> Hadoop Cluster with 1 namenode and 3 datanodes.
> nn1.cluster1.com(192.168.1.70)
> dn1.cluster1.com(192.168.1.72)
> dn2.cluster1.com(192.168.1.73)
> dn3.cluster1.com(192.168.1.74)
> Cluster up and running fine with replication set to "1" for parameter 
> "dfs.replication on all nodes"
> 
> dfs.replication
> 1
> 
> To reduce the wait time, have reduced the dfs.heartbeat and recheck 
> parameters.
> on datanode2 (192.168.1.72)
> [hadoop@dn2 ~]$ hadoop fs -Ddfs.replication=2 -put from_dn2 /
> [hadoop@dn2 ~]$ hadoop fs -ls /from_dn2
> Found 1 items
> -rw-r--r--   2 hadoop supergroup 17 2014-09-23 13:33 /from_dn2
> On Namenode
> ===
> As expected, copy was done from datanode2, one copy will go locally.
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 13:53:16 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=2 [192.168.1.74:50010, 
> 192.168.1.73:50010]
> Can see the blocks on the data nodes disks as well under the "current" 
> directory.
> Now, shutdown datanode2(192.168.1.73) and as expected block moves to another 
> datanode to maintain a replication of 2
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 13:54:21 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=2 [192.168.1.74:50010, 
> 192.168.1.72:50010]
> But, now if i bring back the datanode2, and although the namenode see that 
> this block is at 3 places now and fires a invalidate command for 
> datanode1(192.168.1.72) but the replication on the namenode is bumped to 3 
> immediately.
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 13:56:12 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=3 [192.168.1.74:50010, 
> 192.168.1.72:50010, 192.168.1.73:50010]
> on Datanode1 - The invalidate command has been fired immediately and the 
> block deleted.
> =
> 2014-09-23 13:54:17,483 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving blk_8132629811771280764_1175 src: /192.168.1.74:38099 dest: 
> /192.168.1.72:50010
> 2014-09-23 13:54:17,502 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Received blk_8132629811771280764_1175 src: /192.168.1.74:38099 dest: 
> /192.168.1.72:50010 size 17
> 2014-09-23 13:55:28,720 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Scheduling blk_8132629811771280764_1175 file 
> /space/disk1/current/blk_8132629811771280764 for deletion
> 2014-09-23 13:55:28,721 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Deleted blk_8132629811771280764_1175 at file 
> /space/disk1/current/blk_8132629811771280764
> The namenode still shows 3 replica's. even if one has been deleted, even 
> after more then 30 mins.
> [hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
> FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
> 14:21:27 IST 2014
> /from_dn2 17 bytes, 1 block(s):  OK
> 0. blk_8132629811771280764_1175 len=17 repl=3 [192.168.1.74:50010, 
> 192.168.1.72:50010, 192.168.1.73:50010]
> This could be a dangerous, if someone remove or other 2 datanodes fail.
> On Datanode 1
> =
> Before, the datanode1 is brought back
> [hadoop@dn1 conf]$ ls -l /space/disk*/current
> /space/disk1/current:
> total 28
> -rw-rw-r-- 1 hadoop hadoop   13 Sep 21 09:09 blk_2278001646987517832
> -rw-rw-r-- 1 hadoop hadoop   11 Sep 21 09:09 blk_2278001646987517832_1171.meta
> -rw-rw-r-- 1 hadoop hadoop   17 Sep 23 13:54 blk_8132629811771280764
> -rw-rw-r-- 1 hadoop hadoop   11 Sep 23 13:54 blk_8132629811771280764_1175.meta
> -rw-rw-r-- 1 hadoop hadoop 5299 Sep 21 10:04 dncp_block_verification.log.curr
> 

[jira] [Updated] (HDFS-11078) NPE in LazyPersistFileScrubber

2017-05-22 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated HDFS-11078:
---
Attachment: HDFS-11078-branch-2.7.patch

> NPE in LazyPersistFileScrubber
> --
>
> Key: HDFS-11078
> URL: https://issues.apache.org/jira/browse/HDFS-11078
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
> Attachments: HDFS-11078.000.patch, HDFS-11078.001.patch, 
> HDFS-11078-branch-2.7.patch
>
>
> If a block is removed, it will be removed from the block map. When the 
> clearCorruptLazyPersistFiles() tries to delete the block, it may already be 
> deleted and generate a null pointer exception.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:3820)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:3851)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11078) NPE in LazyPersistFileScrubber

2017-05-22 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated HDFS-11078:
---
Target Version/s: 2.7.4
  Status: In Progress  (was: Patch Available)

> NPE in LazyPersistFileScrubber
> --
>
> Key: HDFS-11078
> URL: https://issues.apache.org/jira/browse/HDFS-11078
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
> Attachments: HDFS-11078.000.patch, HDFS-11078.001.patch, 
> HDFS-11078-branch-2.7.patch
>
>
> If a block is removed, it will be removed from the block map. When the 
> clearCorruptLazyPersistFiles() tries to delete the block, it may already be 
> deleted and generate a null pointer exception.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:3820)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:3851)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11535) Performance analysis of new DFSNetworkTopology#chooseRandom

2017-05-22 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020392#comment-16020392
 ] 

Arpit Agarwal commented on HDFS-11535:
--

Thanks [~vagarychen].

+1 pending Jenkins (which should be a formality).

> Performance analysis of new DFSNetworkTopology#chooseRandom
> ---
>
> Key: HDFS-11535
> URL: https://issues.apache.org/jira/browse/HDFS-11535
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11535.001.patch, HDFS-11535.002.patch, 
> HDFS-11535.003.patch, HDFS-11535.004.patch, PerfTest.pdf
>
>
> This JIRA is created to post the results of some performance experiments we 
> did.  For those who are interested, please the attached .pdf file for more 
> detail. The attached patch file includes the experiment code we ran. 
> The key insights we got from these tests is that: although *the new method 
> outperforms the current one in most cases*. There is still *one case where 
> the current one is better*. Which is when there is only one storage type in 
> the cluster, and we also always look for this storage type. In this case, it 
> is simply a waste of time to perform storage-type-based pruning, blindly 
> picking up a random node (current methods) would suffice.
> Therefore, based on the analysis, we propose to use a *combination of both 
> the old and the new methods*:
> say, we search for a node of type X, since now inner node all keep storage 
> type info, we can *just check root node to see if X is the only type it has*. 
> If yes, blindly picking a random leaf will work, so we simply call the old 
> method, otherwise we call the new method.
> There is still at least one missing piece in this performance test, which is 
> garbage collection. The new method does a few more object creation when doing 
> the search, which adds overhead to GC. I'm still thinking of any potential 
> optimization but this seems tricky, also I'm not sure whether this 
> optimization worth doing at all. Please feel free to leave any 
> comments/suggestions.
> Thanks [~arpitagarwal] and [~szetszwo] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10480) Add an admin command to list currently open files

2017-05-22 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020393#comment-16020393
 ] 

Andrew Wang commented on HDFS-10480:


I'll review this shortly.

bq. I never understand the need to check twice for 
checkOperation(OperationCategory.READ). It is all over namenode code.

The issue is that the NN can go from being active to standby between the two 
checks. While the FSN lock is held, it will not transition HA state. The second 
check is sufficient for correctness, but the first one helps as an early-exit 
for performance.

> Add an admin command to list currently open files
> -
>
> Key: HDFS-10480
> URL: https://issues.apache.org/jira/browse/HDFS-10480
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kihwal Lee
>Assignee: Manoj Govindassamy
> Attachments: HDFS-10480.02.patch, HDFS-10480.03.patch, 
> HDFS-10480.04.patch, HDFS-10480-trunk-1.patch, HDFS-10480-trunk.patch
>
>
> Currently there is no easy way to obtain the list of active leases or files 
> being written. It will be nice if we have an admin command to list open files 
> and their lease holders.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11535) Performance analysis of new DFSNetworkTopology#chooseRandom

2017-05-22 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-11535:
--
Attachment: HDFS-11535.004.patch

Thanks [~arpitagarwal] for the comments! Post v004 patch with a number of style 
updates.

> Performance analysis of new DFSNetworkTopology#chooseRandom
> ---
>
> Key: HDFS-11535
> URL: https://issues.apache.org/jira/browse/HDFS-11535
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11535.001.patch, HDFS-11535.002.patch, 
> HDFS-11535.003.patch, HDFS-11535.004.patch, PerfTest.pdf
>
>
> This JIRA is created to post the results of some performance experiments we 
> did.  For those who are interested, please the attached .pdf file for more 
> detail. The attached patch file includes the experiment code we ran. 
> The key insights we got from these tests is that: although *the new method 
> outperforms the current one in most cases*. There is still *one case where 
> the current one is better*. Which is when there is only one storage type in 
> the cluster, and we also always look for this storage type. In this case, it 
> is simply a waste of time to perform storage-type-based pruning, blindly 
> picking up a random node (current methods) would suffice.
> Therefore, based on the analysis, we propose to use a *combination of both 
> the old and the new methods*:
> say, we search for a node of type X, since now inner node all keep storage 
> type info, we can *just check root node to see if X is the only type it has*. 
> If yes, blindly picking a random leaf will work, so we simply call the old 
> method, otherwise we call the new method.
> There is still at least one missing piece in this performance test, which is 
> garbage collection. The new method does a few more object creation when doing 
> the search, which adds overhead to GC. I'm still thinking of any potential 
> optimization but this seems tricky, also I'm not sure whether this 
> optimization worth doing at all. Please feel free to leave any 
> comments/suggestions.
> Thanks [~arpitagarwal] and [~szetszwo] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete

2017-05-22 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-9412:
--
Attachment: HDFS-9412-branch-2.7.00.patch

> getBlocks occupies FSLock and takes too long to complete
> 
>
> Key: HDFS-9412
> URL: https://issues.apache.org/jira/browse/HDFS-9412
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: He Tianyi
>Assignee: He Tianyi
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: HDFS-9412..patch, HDFS-9412.0001.patch, 
> HDFS-9412.0002.patch, HDFS-9412-branch-2.7.00.patch
>
>
> {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a 
> long time to complete (probably several seconds, if number of blocks are too 
> much). 
> During this period, other threads attempting to acquire write lock will wait. 
> In an extreme case, RPC handlers are occupied by one reader thread calling 
> {{getBlocks}} and all other threads waiting for write lock, rpc server acts 
> like hung. Unfortunately, this tends to happen in heavy loaded cluster, since 
> read operations come and go fast (they do not need to wait), leaving write 
> operations waiting.
> Looks like we can optimize this thing like DN block report did in past, by 
> splitting the operation into smaller sub operations, and let other threads do 
> their work between each sub operation. The whole result is returned at once, 
> though (one thing different from DN block report). 
> I am not sure whether this will work. Any better idea?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11808) Backport HDFS-8549 to branch-2.7: Abort the balancer if an upgrade is in progress

2017-05-22 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11808:
-
Attachment: HDFS-11808-branch-2.7-01.patch

rebased.

> Backport HDFS-8549 to branch-2.7: Abort the balancer if an upgrade is in 
> progress
> -
>
> Key: HDFS-11808
> URL: https://issues.apache.org/jira/browse/HDFS-11808
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Akira Ajisaka
> Attachments: HDFS-11808-branch-2.7-01.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-8549 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11808) Backport HDFS-8549 to branch-2.7: Abort the balancer if an upgrade is in progress

2017-05-22 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11808:
-
Status: Patch Available  (was: Open)

> Backport HDFS-8549 to branch-2.7: Abort the balancer if an upgrade is in 
> progress
> -
>
> Key: HDFS-11808
> URL: https://issues.apache.org/jira/browse/HDFS-11808
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Akira Ajisaka
> Attachments: HDFS-11808-branch-2.7-01.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-8549 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11837) Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in batches

2017-05-22 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-11837:
-
Status: Patch Available  (was: Open)

> Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in 
> batches
> -
>
> Key: HDFS-11837
> URL: https://issues.apache.org/jira/browse/HDFS-11837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-9710-branch-2.7.00.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9710 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-11808) Backport HDFS-8549 to branch-2.7: Abort the balancer if an upgrade is in progress

2017-05-22 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka reassigned HDFS-11808:


Assignee: Akira Ajisaka

> Backport HDFS-8549 to branch-2.7: Abort the balancer if an upgrade is in 
> progress
> -
>
> Key: HDFS-11808
> URL: https://issues.apache.org/jira/browse/HDFS-11808
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Akira Ajisaka
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-8549 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11837) Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in batches

2017-05-22 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-11837:
-
Attachment: HDFS-9710-branch-2.7.00.patch

> Backport HDFS-9710 to branch-2.7: Change DN to send block receipt IBRs in 
> batches
> -
>
> Key: HDFS-11837
> URL: https://issues.apache.org/jira/browse/HDFS-11837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinitha Reddy Gankidi
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-9710-branch-2.7.00.patch
>
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport HDFS-9710 to branch-2.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-2538) option to disable fsck dots

2017-05-22 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020357#comment-16020357
 ] 

Akira Ajisaka edited comment on HDFS-2538 at 5/22/17 11:01 PM:
---

bq. shall we remove 2.7.4 from target version and releaseblocker from label..?
I think yes. What do you think [~shv]?


was (Author: ajisakaa):
bq. shall we remove 2.7.4 from target version and releaseblocker from label..?
I think yes.

> option to disable fsck dots 
> 
>
> Key: HDFS-2538
> URL: https://issues.apache.org/jira/browse/HDFS-2538
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Allen Wittenauer
>Assignee: Mohammad Kamrul Islam
>Priority: Minor
>  Labels: newbie, release-blocker
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-2538.1.patch, HDFS-2538.2.patch, HDFS-2538.3.patch, 
> HDFS-2538-branch-0.20-security-204.patch, 
> HDFS-2538-branch-0.20-security-204.patch, HDFS-2538-branch-1.0.patch, 
> HDFS-2538-branch-2.7.patch
>
>
> this patch turns the dots during fsck off by default and provides an option 
> to turn them back on if you have a fetish for millions and millions of dots 
> on your terminal.  i haven't done any benchmarks, but i suspect fsck is now 
> 300% faster to boot.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2538) option to disable fsck dots

2017-05-22 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020357#comment-16020357
 ] 

Akira Ajisaka commented on HDFS-2538:
-

bq. shall we remove 2.7.4 from target version and releaseblocker from label..?
I think yes.

> option to disable fsck dots 
> 
>
> Key: HDFS-2538
> URL: https://issues.apache.org/jira/browse/HDFS-2538
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Allen Wittenauer
>Assignee: Mohammad Kamrul Islam
>Priority: Minor
>  Labels: newbie, release-blocker
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-2538.1.patch, HDFS-2538.2.patch, HDFS-2538.3.patch, 
> HDFS-2538-branch-0.20-security-204.patch, 
> HDFS-2538-branch-0.20-security-204.patch, HDFS-2538-branch-1.0.patch, 
> HDFS-2538-branch-2.7.patch
>
>
> this patch turns the dots during fsck off by default and provides an option 
> to turn them back on if you have a fetish for millions and millions of dots 
> on your terminal.  i haven't done any benchmarks, but i suspect fsck is now 
> 300% faster to boot.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11817) A faulty node can cause a lease leak and NPE on accessing data

2017-05-22 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11817:
--
Attachment: HDFS-11817.v2.trunk.patch
HDFS-11817.v2.branch-2.patch

> A faulty node can cause a lease leak and NPE on accessing data
> --
>
> Key: HDFS-11817
> URL: https://issues.apache.org/jira/browse/HDFS-11817
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-11817.branch-2.patch, hdfs-11817_supplement.txt, 
> HDFS-11817.v2.branch-2.patch, HDFS-11817.v2.trunk.patch
>
>
> When the namenode performs a lease recovery for a failed write, the 
> {{commitBlockSynchronization()}} will fail, if none of the new target has 
> sent a received-IBR.  At this point, the data is inaccessible, as the 
> namenode will throw a {{NullPointerException}} upon {{getBlockLocations()}}.
> The lease recovery will be retried in about an hour by the namenode. If the 
> nodes are faulty (usually when there is only one new target), they may not 
> block report until this point. If this happens, lease recovery throws an 
> {{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove 
> the lease without  finalizing the inode.  
> This results in an inconsistent lease state. The inode stays 
> under-construction, but no more lease recovery is attempted. A manual lease 
> recovery is also not allowed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11866) JournalNode Sync should be off by default in hdfs-default.xml

2017-05-22 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020340#comment-16020340
 ] 

Arpit Agarwal commented on HDFS-11866:
--

+1 pending Jenkins.

> JournalNode Sync should be off by default in hdfs-default.xml
> -
>
> Key: HDFS-11866
> URL: https://issues.apache.org/jira/browse/HDFS-11866
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: HDFS-11866.001.patch
>
>
> dfs.journalnode.enable.sync is set to true in hdfs-default.xml. It should be 
> set to false to disable the feature by default, as discussed in HDFS-4025.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11869) Backport HDFS-11078 to branch 2.7

2017-05-22 Thread Inigo Goiri (JIRA)
Inigo Goiri created HDFS-11869:
--

 Summary: Backport HDFS-11078 to branch 2.7
 Key: HDFS-11869
 URL: https://issues.apache.org/jira/browse/HDFS-11869
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: HDFS-11078-branch-2.7.patch





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11869) Backport HDFS-11078 to branch 2.7

2017-05-22 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated HDFS-11869:
---
Attachment: HDFS-11078-branch-2.7.patch

> Backport HDFS-11078 to branch 2.7
> -
>
> Key: HDFS-11869
> URL: https://issues.apache.org/jira/browse/HDFS-11869
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
> Attachments: HDFS-11078-branch-2.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11869) Backport HDFS-11078 to branch 2.7

2017-05-22 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated HDFS-11869:
---
Status: In Progress  (was: Patch Available)

> Backport HDFS-11078 to branch 2.7
> -
>
> Key: HDFS-11869
> URL: https://issues.apache.org/jira/browse/HDFS-11869
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
> Attachments: HDFS-11078-branch-2.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11869) Backport HDFS-11078 to branch 2.7

2017-05-22 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated HDFS-11869:
---
Status: Patch Available  (was: Open)

> Backport HDFS-11078 to branch 2.7
> -
>
> Key: HDFS-11869
> URL: https://issues.apache.org/jira/browse/HDFS-11869
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
> Attachments: HDFS-11078-branch-2.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11868) Backport HDFS-8674 to branch 2.7

2017-05-22 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated HDFS-11868:
---
Status: In Progress  (was: Patch Available)

> Backport HDFS-8674 to branch 2.7
> 
>
> Key: HDFS-11868
> URL: https://issues.apache.org/jira/browse/HDFS-11868
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
> Attachments: HDFS-8674-branch-2.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11868) Backport HDFS-8674 to branch 2.7

2017-05-22 Thread Inigo Goiri (JIRA)
Inigo Goiri created HDFS-11868:
--

 Summary: Backport HDFS-8674 to branch 2.7
 Key: HDFS-11868
 URL: https://issues.apache.org/jira/browse/HDFS-11868
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: HDFS-8674-branch-2.7.patch





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11868) Backport HDFS-8674 to branch 2.7

2017-05-22 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated HDFS-11868:
---
Attachment: HDFS-8674-branch-2.7.patch

> Backport HDFS-8674 to branch 2.7
> 
>
> Key: HDFS-11868
> URL: https://issues.apache.org/jira/browse/HDFS-11868
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
> Attachments: HDFS-8674-branch-2.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11868) Backport HDFS-8674 to branch 2.7

2017-05-22 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated HDFS-11868:
---
Status: Patch Available  (was: Open)

> Backport HDFS-8674 to branch 2.7
> 
>
> Key: HDFS-11868
> URL: https://issues.apache.org/jira/browse/HDFS-11868
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
> Attachments: HDFS-8674-branch-2.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11867) Backport HDFS-6291 to branch 2.7

2017-05-22 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated HDFS-11867:
---
Attachment: HDFS-6291-branch-2.7.patch

> Backport HDFS-6291 to branch 2.7
> 
>
> Key: HDFS-11867
> URL: https://issues.apache.org/jira/browse/HDFS-11867
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
> Attachments: HDFS-6291-branch-2.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11867) Backport HDFS-6291 to branch 2.7

2017-05-22 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated HDFS-11867:
---
Status: In Progress  (was: Patch Available)

> Backport HDFS-6291 to branch 2.7
> 
>
> Key: HDFS-11867
> URL: https://issues.apache.org/jira/browse/HDFS-11867
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
> Attachments: HDFS-6291-branch-2.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11867) Backport HDFS-6291 to branch 2.7

2017-05-22 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated HDFS-11867:
---
Status: Patch Available  (was: Open)

> Backport HDFS-6291 to branch 2.7
> 
>
> Key: HDFS-11867
> URL: https://issues.apache.org/jira/browse/HDFS-11867
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
> Attachments: HDFS-6291-branch-2.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11867) Backport HDFS-6291 to branch 2.7

2017-05-22 Thread Inigo Goiri (JIRA)
Inigo Goiri created HDFS-11867:
--

 Summary: Backport HDFS-6291 to branch 2.7
 Key: HDFS-11867
 URL: https://issues.apache.org/jira/browse/HDFS-11867
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Inigo Goiri
Assignee: Inigo Goiri






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020326#comment-16020326
 ] 

Hadoop QA commented on HDFS-11741:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 36s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 71 unchanged - 0 fixed = 73 total (was 71) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 51s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 97m  4s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.web.TestWebHDFS |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11741 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12869326/HDFS-11741.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux d658dd0b9b9e 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 9cab42c |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19545/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19545/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19545/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19545/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically 

[jira] [Updated] (HDFS-11866) JournalNode Sync should be off by default in hdfs-default.xml

2017-05-22 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-11866:
--
Attachment: HDFS-11866.001.patch

> JournalNode Sync should be off by default in hdfs-default.xml
> -
>
> Key: HDFS-11866
> URL: https://issues.apache.org/jira/browse/HDFS-11866
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: HDFS-11866.001.patch
>
>
> dfs.journalnode.enable.sync is set to true in hdfs-default.xml. It should be 
> set to false to disable the feature by default, as discussed in HDFS-4025.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11866) JournalNode Sync should be off by default in hdfs-default.xml

2017-05-22 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-11866:
--
Status: Patch Available  (was: Open)

> JournalNode Sync should be off by default in hdfs-default.xml
> -
>
> Key: HDFS-11866
> URL: https://issues.apache.org/jira/browse/HDFS-11866
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: HDFS-11866.001.patch
>
>
> dfs.journalnode.enable.sync is set to true in hdfs-default.xml. It should be 
> set to false to disable the feature by default, as discussed in HDFS-4025.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11866) JournalNode Sync should be off by default in hdfs-default.xml

2017-05-22 Thread Hanisha Koneru (JIRA)
Hanisha Koneru created HDFS-11866:
-

 Summary: JournalNode Sync should be off by default in 
hdfs-default.xml
 Key: HDFS-11866
 URL: https://issues.apache.org/jira/browse/HDFS-11866
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


dfs.journalnode.enable.sync is set to true in hdfs-default.xml. It should be 
set to false to disable the feature by default, as discussed in HDFS-4025.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11817) A faulty node can cause a lease leak and NPE on accessing data

2017-05-22 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020252#comment-16020252
 ] 

Kihwal Lee commented on HDFS-11817:
---

In trunk, there already is a logic to weed out null StorageInfo before putting 
one to the expected locations. This was done by as part of HDFS-9040. It too 
had TestRetryCacheWithHA failed, so it was also fixed as part of HDFS-9040, 
although I believe my fix is better. As it is a EC-related change, the JIRA 
cannot be applied to branch-2.   I will back-port the relevant portion in my 
patch, so that trunk and branch-2/2.8 stays more in sync.  The trunk version of 
my patch will contain the test case (HDFS-9040 did not add a new test case for 
this) and the lease manager fix.

> A faulty node can cause a lease leak and NPE on accessing data
> --
>
> Key: HDFS-11817
> URL: https://issues.apache.org/jira/browse/HDFS-11817
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-11817.branch-2.patch, hdfs-11817_supplement.txt
>
>
> When the namenode performs a lease recovery for a failed write, the 
> {{commitBlockSynchronization()}} will fail, if none of the new target has 
> sent a received-IBR.  At this point, the data is inaccessible, as the 
> namenode will throw a {{NullPointerException}} upon {{getBlockLocations()}}.
> The lease recovery will be retried in about an hour by the namenode. If the 
> nodes are faulty (usually when there is only one new target), they may not 
> block report until this point. If this happens, lease recovery throws an 
> {{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove 
> the lease without  finalizing the inode.  
> This results in an inconsistent lease state. The inode stays 
> under-construction, but no more lease recovery is attempted. A manual lease 
> recovery is also not allowed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11865) Ozone: Do not initialize Ratis cluster during datanode startup

2017-05-22 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-11865:
---
Status: Patch Available  (was: Open)

> Ozone: Do not initialize Ratis cluster during datanode startup
> --
>
> Key: HDFS-11865
> URL: https://issues.apache.org/jira/browse/HDFS-11865
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: HDFS-11865-HDFS-7240.20170522.patch
>
>
> During a datanode startup, we current pass dfs.container.ratis.conf so that 
> the datanode is bound to a particular Ratis cluster.
> In this JIRA, we change Datanode that the datanode is no longer bound to any 
> Ratis cluster during startup. We use the Ratis reinitialize request 
> (RATIS-86) to set up a Ratis cluster later on.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11865) Ozone: Do not initialize Ratis cluster during datanode startup

2017-05-22 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-11865:
---
Attachment: HDFS-11865-HDFS-7240.20170522.patch

HDFS-11865-HDFS-7240.20170522.patch: 1st patch.

> Ozone: Do not initialize Ratis cluster during datanode startup
> --
>
> Key: HDFS-11865
> URL: https://issues.apache.org/jira/browse/HDFS-11865
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: HDFS-11865-HDFS-7240.20170522.patch
>
>
> During a datanode startup, we current pass dfs.container.ratis.conf so that 
> the datanode is bound to a particular Ratis cluster.
> In this JIRA, we change Datanode that the datanode is no longer bound to any 
> Ratis cluster during startup. We use the Ratis reinitialize request 
> (RATIS-86) to set up a Ratis cluster later on.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11865) Ozone: Do not initialize Ratis cluster during datanode startup

2017-05-22 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-11865:
--

 Summary: Ozone: Do not initialize Ratis cluster during datanode 
startup
 Key: HDFS-11865
 URL: https://issues.apache.org/jira/browse/HDFS-11865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


During a datanode startup, we current pass dfs.container.ratis.conf so that the 
datanode is bound to a particular Ratis cluster.

In this JIRA, we change Datanode that the datanode is no longer bound to any 
Ratis cluster during startup. We use the Ratis reinitialize request (RATIS-86) 
to set up a Ratis cluster later on.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema

2017-05-22 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020209#comment-16020209
 ] 

Lei (Eddy) Xu commented on HDFS-7337:
-

Hi, [~drankye] and [~Sammi]

Thanks a lot for the reply. The explanation helps a lot.

bq. There are several system wide codecs to use, including RS codec, RS legacy 
codec and XOR codec.

Is there a way to choose a system-wide *_default_* codec? So that after the 
cluster being initialized,  users and admins can just specify a zone / 
directory to be "erasure coded", instead of choosing from several different 
codes, and each one has its own trade-offs, which require user / admin to 
understand? 

bq. while it supports to add / remove policies using CLI,  it dose not support 
to enable / use the policy via CLI?

My concern is that, even if the admin is able to add policy via the API 
*dynamically*, it still requires the admin to reboot NN, or ssh into NN / 
change conf files and reload NN confs, to enable the policy? It makes the 
workflow complicated.  I think using API / CLI and ssh NN / changing conf files 
should be two different sets of operations.  If possible, it is more consistent 
to do the EC policy management in either one, or both.  The current design is 
doing half of the management in each approach. 

Thanks.


> Configurable and pluggable Erasure Codec and schema
> ---
>
> Key: HDFS-7337
> URL: https://issues.apache.org/jira/browse/HDFS-7337
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Zhe Zhang
>Priority: Critical
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-7337-prototype-v1.patch, 
> HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, 
> PluggableErasureCodec.pdf, PluggableErasureCodec-v2.pdf, 
> PluggableErasureCodec-v3.pdf, PluggableErasureCodec v4.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple 
> Erasure Codecs via pluggable approach. It allows to define and configure 
> multiple codec schemas with different coding algorithms and parameters. The 
> resultant codec schemas can be utilized and specified via command tool for 
> different file folders. While design and implement such pluggable framework, 
> it’s also to implement a concrete codec by default (Reed Solomon) to prove 
> the framework is useful and workable. Separate JIRA could be opened for the 
> RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation 
> to make concrete vendor libraries transparent to the upper layer. This JIRA 
> focuses on high level stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11770) Ozone: KSM: Add setVolumeProperty

2017-05-22 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-11770:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
Target Version/s: HDFS-7240
  Status: Resolved  (was: Patch Available)

Thanks [~msingh] for the contribution. I've commit the latest patch to the 
feature branch. 

> Ozone: KSM: Add setVolumeProperty
> -
>
> Key: HDFS-11770
> URL: https://issues.apache.org/jira/browse/HDFS-11770
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Anu Engineer
>Assignee: Mukul Kumar Singh
> Fix For: HDFS-7240
>
> Attachments: HDFS-11770-HDFS-7240.001.patch, 
> HDFS-11770-HDFS-7240.002.patch, HDFS-11770-HDFS-7240.003.patch, 
> HDFS-11770-HDFS-7240.005.patch, HDFS-11770-HDFS-7240.006.patch
>
>
> SetVolumeProperty allows Ozone administrators to change the ownership of a 
> volume and quota.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-22 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-11741:
---
Attachment: HDFS-11741.004.patch

Hi Yongjun. The patch should still apply with git apply -3.
I posted a new patch for a little update.

Thinking it again, I am able to answer the previous question. 
DelegationTokenKey expires in token life time, whereas current block key 
expires in 2* key update interval + token life time, so in most cases DTK 
shouldn't expire before current BK expires.

I wanted to add unit tests for the code in Dispatcher#dispatch that catches 
InvalidEncryptionKey exception, but the dispatcher code is quite monolithic and 
I haven't find a good way to write a unit test. Even an integration test is not 
trivial.

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11741.001.patch, HDFS-11741.002.patch, 
> HDFS-11741.003.patch, HDFS-11741.004.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11860) Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not remove chosen node from healthy list.

2017-05-22 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-11860:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: HDFS-7240
Target Version/s: HDFS-7240
  Status: Resolved  (was: Patch Available)

Thanks [~vagarychen] for the review. I've commit the fix to the feature branch. 

> Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not 
> remove chosen node from healthy list.
> -
>
> Key: HDFS-11860
> URL: https://issues.apache.org/jira/browse/HDFS-11860
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Fix For: HDFS-7240
>
> Attachments: HDFS-11860-HDFS-7240.001.patch, 
> HDFS-11860-HDFS-7240.002.patch
>
>
> This was caught in Jenkins run randomly. After debugging, found the cause is 
> the 
> logic when two random index happens to be the same below where the node id 
> was returned without being removed from the healthy list for next round of 
> selection. As a result, there could be duplicated datanodes chosen for the 
> pipeline and the machine list size smaller than expected. I will post a fix 
> soon. 
> {code}
> SCMContainerPlacementCapacity#chooseNode
>  // There is a possibility that both numbers will be same.
>  // if that is so, we just return the node.
>  if (firstNodeNdx == secondNodeNdx) {
>   return healthyNodes.get(firstNodeNdx);
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11860) Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not remove chosen node from healthy list.

2017-05-22 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020121#comment-16020121
 ] 

Xiaoyu Yao commented on HDFS-11860:
---

Thanks [~vagarychen] for the review. I will commit the patch shortly based on 
the Jenkins result and your +1. 

> Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not 
> remove chosen node from healthy list.
> -
>
> Key: HDFS-11860
> URL: https://issues.apache.org/jira/browse/HDFS-11860
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-11860-HDFS-7240.001.patch, 
> HDFS-11860-HDFS-7240.002.patch
>
>
> This was caught in Jenkins run randomly. After debugging, found the cause is 
> the 
> logic when two random index happens to be the same below where the node id 
> was returned without being removed from the healthy list for next round of 
> selection. As a result, there could be duplicated datanodes chosen for the 
> pipeline and the machine list size smaller than expected. I will post a fix 
> soon. 
> {code}
> SCMContainerPlacementCapacity#chooseNode
>  // There is a possibility that both numbers will be same.
>  // if that is so, we just return the node.
>  if (firstNodeNdx == secondNodeNdx) {
>   return healthyNodes.get(firstNodeNdx);
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11770) Ozone: KSM: Add setVolumeProperty

2017-05-22 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020117#comment-16020117
 ] 

Xiaoyu Yao commented on HDFS-11770:
---

The latest patch looks good to me. +1, I will commit it shortly.

Random test failure in hadoop.ozone.scm.node.TestContainerPlacement will be 
fixed with HDFS-11860.

> Ozone: KSM: Add setVolumeProperty
> -
>
> Key: HDFS-11770
> URL: https://issues.apache.org/jira/browse/HDFS-11770
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Anu Engineer
>Assignee: Mukul Kumar Singh
> Fix For: HDFS-7240
>
> Attachments: HDFS-11770-HDFS-7240.001.patch, 
> HDFS-11770-HDFS-7240.002.patch, HDFS-11770-HDFS-7240.003.patch, 
> HDFS-11770-HDFS-7240.005.patch, HDFS-11770-HDFS-7240.006.patch
>
>
> SetVolumeProperty allows Ozone administrators to change the ownership of a 
> volume and quota.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11859) Ozone : separate blockLocationProtocol out of containerLocationProtocol

2017-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020102#comment-16020102
 ] 

Hadoop QA commented on HDFS-11859:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
36s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
26s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
26s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
24s{color} | {color:green} HDFS-7240 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
32s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client in HDFS-7240 
has 2 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
51s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in HDFS-7240 has 10 
extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} HDFS-7240 passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
7s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
29s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 23s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}106m 56s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.scm.TestAllocateContainer |
|   | hadoop.ozone.web.TestDistributedOzoneVolumes |
|   | hadoop.cblock.TestLocalBlockCache |
|   | hadoop.ozone.scm.TestContainerSQLCli |
|   | hadoop.ozone.web.TestOzoneWebAccess |
|   | hadoop.ozone.TestContainerOperations |
|   | hadoop.ozone.scm.TestSCMMXBean |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.ozone.scm.TestSCMCli |
|   | hadoop.ozone.container.common.impl.TestContainerPersistence |
|   | 

[jira] [Commented] (HDFS-11863) Document missing metrics for blocks count in pending IBR

2017-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020098#comment-16020098
 ] 

Hudson commented on HDFS-11863:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11765 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11765/])
HDFS-11863. Document missing metrics for blocks count in pending IBR. (brahma: 
rev 2a8fcf0c9a9b5293238a8ab76c19d74a6a3bae72)
* (edit) hadoop-common-project/hadoop-common/src/site/markdown/Metrics.md


> Document missing metrics for blocks count in pending IBR
> 
>
> Key: HDFS-11863
> URL: https://issues.apache.org/jira/browse/HDFS-11863
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.0.0-alpha3
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HDFS-11863.001.patch, HDFS-11863.002.patch
>
>
> HDFS-11534 introduced some useful metrics for blocks count in pending IBR.
> {noformat}
>   @Metric("Count of blocks in pending IBR")
>   private MutableGaugeLong blocksInPendingIBR;
>   @Metric("Count of blocks at receiving status in pending IBR")
>   private MutableGaugeLong blocksReceivingInPendingIBR;
>   @Metric("Count of blocks at received status in pending IBR")
>   private MutableGaugeLong blocksReceivedInPendingIBR;
> {noformat}
> It will be nice to have in the metrics documentation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11849) JournalNode startup failure exception should be logged in log file

2017-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020099#comment-16020099
 ] 

Hudson commented on HDFS-11849:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11765 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11765/])
HDFS-11849. JournalNode startup failure exception should be logged in (brahma: 
rev 9cab42cc797986081fef184748044f1790a4f039)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNode.java


> JournalNode startup failure exception should be logged in log file
> --
>
> Key: HDFS-11849
> URL: https://issues.apache.org/jira/browse/HDFS-11849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node
>Affects Versions: 2.7.0
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha3, 2.8.1
>
> Attachments: HDFS-11849-001.patch, HDFS-11849-002.patch
>
>
> JournalNode failed to start because of kerberos login. 
> {noformat}
> Exception in thread "main" java.io.IOException: Login failure for 
> xxx/y...@.com from keytab dummy.keytab: 
> javax.security.auth.login.LoginException: host1
> at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:994)
> at 
> org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:281)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:153)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:132)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:318)
> {noformat}
> but this exception is not written in log file.
> {noformat}
> STARTUP_MSG:   java = 1.x.x
> /
> 2017-05-18 16:08:14,961 INFO 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode: registered UNIX signal 
> handlers for [TERM, HUP, INT]
> 2017-05-18 16:08:15,511 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: 
> loaded properties from hadoop-metrics2.properties
> 2017-05-18 16:08:15,660 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period 
> at 10 second(s).
> 2017-05-18 16:08:15,660 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: JournalNode metrics system 
> started
> 2017-05-18 16:08:16,429 INFO 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down JournalNode at w-x-y-z
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11860) Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not remove chosen node from healthy list.

2017-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020084#comment-16020084
 ] 

Hadoop QA commented on HDFS-11860:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
3s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
48s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} HDFS-7240 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
58s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in HDFS-7240 has 10 
extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 26s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 98m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
|   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
|   | hadoop.cblock.TestCBlockServerPersistence |
|   | hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11860 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12869301/HDFS-11860-HDFS-7240.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux b5e572abf631 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HDFS-7240 / b72ac92 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19543/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19543/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19543/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 

[jira] [Updated] (HDFS-11849) JournalNode startup failure exception should be logged in log file

2017-05-22 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-11849:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.1
   3.0.0-alpha3
   2.7.4
   2.9.0
   Status: Resolved  (was: Patch Available)

Committed to trunk,branch-2,branch-2.8.1 and branch-2.7..[~surendrasingh] 
thanks for contribution and thanks to [~vinayrpet] for additional review.

> JournalNode startup failure exception should be logged in log file
> --
>
> Key: HDFS-11849
> URL: https://issues.apache.org/jira/browse/HDFS-11849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node
>Affects Versions: 2.7.0
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha3, 2.8.1
>
> Attachments: HDFS-11849-001.patch, HDFS-11849-002.patch
>
>
> JournalNode failed to start because of kerberos login. 
> {noformat}
> Exception in thread "main" java.io.IOException: Login failure for 
> xxx/y...@.com from keytab dummy.keytab: 
> javax.security.auth.login.LoginException: host1
> at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:994)
> at 
> org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:281)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:153)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:132)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:318)
> {noformat}
> but this exception is not written in log file.
> {noformat}
> STARTUP_MSG:   java = 1.x.x
> /
> 2017-05-18 16:08:14,961 INFO 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode: registered UNIX signal 
> handlers for [TERM, HUP, INT]
> 2017-05-18 16:08:15,511 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: 
> loaded properties from hadoop-metrics2.properties
> 2017-05-18 16:08:15,660 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period 
> at 10 second(s).
> 2017-05-18 16:08:15,660 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: JournalNode metrics system 
> started
> 2017-05-18 16:08:16,429 INFO 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down JournalNode at w-x-y-z
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2538) option to disable fsck dots

2017-05-22 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019981#comment-16019981
 ] 

Brahma Reddy Battula commented on HDFS-2538:


shall we remove {{2.7.4}} from target version and {{releaseblocker}} from 
label..?

> option to disable fsck dots 
> 
>
> Key: HDFS-2538
> URL: https://issues.apache.org/jira/browse/HDFS-2538
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Allen Wittenauer
>Assignee: Mohammad Kamrul Islam
>Priority: Minor
>  Labels: newbie, release-blocker
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-2538.1.patch, HDFS-2538.2.patch, HDFS-2538.3.patch, 
> HDFS-2538-branch-0.20-security-204.patch, 
> HDFS-2538-branch-0.20-security-204.patch, HDFS-2538-branch-1.0.patch, 
> HDFS-2538-branch-2.7.patch
>
>
> this patch turns the dots during fsck off by default and provides an option 
> to turn them back on if you have a fetish for millions and millions of dots 
> on your terminal.  i haven't done any benchmarks, but i suspect fsck is now 
> 300% faster to boot.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11863) Document missing metrics for blocks count in pending IBR

2017-05-22 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-11863:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha3
   2.9.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2. [~linyiqun] thanks for contribution and 
[~hanishakoneru] thanks for additional review.

> Document missing metrics for blocks count in pending IBR
> 
>
> Key: HDFS-11863
> URL: https://issues.apache.org/jira/browse/HDFS-11863
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.0.0-alpha3
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HDFS-11863.001.patch, HDFS-11863.002.patch
>
>
> HDFS-11534 introduced some useful metrics for blocks count in pending IBR.
> {noformat}
>   @Metric("Count of blocks in pending IBR")
>   private MutableGaugeLong blocksInPendingIBR;
>   @Metric("Count of blocks at receiving status in pending IBR")
>   private MutableGaugeLong blocksReceivingInPendingIBR;
>   @Metric("Count of blocks at received status in pending IBR")
>   private MutableGaugeLong blocksReceivedInPendingIBR;
> {noformat}
> It will be nice to have in the metrics documentation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11859) Ozone : separate blockLocationProtocol out of containerLocationProtocol

2017-05-22 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-11859:
--
Attachment: HDFS-11859-HDFS-7240.005.patch

I can't repro the biding error when running all ozone tests locally. Upload a 
patch that change the SCM block service RPC port to 9880 to validate if this is 
caused by the port 9863 used in previous patch is reserved by Jenkins machines 
for other purposes. 

> Ozone : separate blockLocationProtocol out of containerLocationProtocol
> ---
>
> Key: HDFS-11859
> URL: https://issues.apache.org/jira/browse/HDFS-11859
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Chen Liang
>Assignee: Xiaoyu Yao
> Attachments: HDFS-11859-HDFS-7240.001.patch, 
> HDFS-11859-HDFS-7240.002.patch, HDFS-11859-HDFS-7240.003.patch, 
> HDFS-11859-HDFS-7240.004.patch, HDFS-11859-HDFS-7240.005.patch
>
>
> Currently StorageLocationProtcol contains two types of operations: container 
> related operations and block related operations. Although there is 
> {{ScmBlockLocationProtocol}} for block operations, only 
> {{StorageContainerLocationProtocolServerSideTranslatorPB}} is making the 
> distinguish. 
> This JIRA tries to make the separation complete and thorough for all places.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11863) Document missing metrics for blocks count in pending IBR

2017-05-22 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019934#comment-16019934
 ] 

Brahma Reddy Battula commented on HDFS-11863:
-

bq.BTW, do you have found any other missing metrics that needed to be 
documented across hdfs project, Brahma Reddy Battula?
Yes, not in hand.. will raise seperate jira's

+1, latest patch.. will commit shortly.

> Document missing metrics for blocks count in pending IBR
> 
>
> Key: HDFS-11863
> URL: https://issues.apache.org/jira/browse/HDFS-11863
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.0.0-alpha3
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-11863.001.patch, HDFS-11863.002.patch
>
>
> HDFS-11534 introduced some useful metrics for blocks count in pending IBR.
> {noformat}
>   @Metric("Count of blocks in pending IBR")
>   private MutableGaugeLong blocksInPendingIBR;
>   @Metric("Count of blocks at receiving status in pending IBR")
>   private MutableGaugeLong blocksReceivingInPendingIBR;
>   @Metric("Count of blocks at received status in pending IBR")
>   private MutableGaugeLong blocksReceivedInPendingIBR;
> {noformat}
> It will be nice to have in the metrics documentation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11860) Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not remove chosen node from healthy list.

2017-05-22 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019922#comment-16019922
 ] 

Chen Liang commented on HDFS-11860:
---

thanks [~xyao] for v002 patch, +1, pending jenkins

> Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not 
> remove chosen node from healthy list.
> -
>
> Key: HDFS-11860
> URL: https://issues.apache.org/jira/browse/HDFS-11860
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-11860-HDFS-7240.001.patch, 
> HDFS-11860-HDFS-7240.002.patch
>
>
> This was caught in Jenkins run randomly. After debugging, found the cause is 
> the 
> logic when two random index happens to be the same below where the node id 
> was returned without being removed from the healthy list for next round of 
> selection. As a result, there could be duplicated datanodes chosen for the 
> pipeline and the machine list size smaller than expected. I will post a fix 
> soon. 
> {code}
> SCMContainerPlacementCapacity#chooseNode
>  // There is a possibility that both numbers will be same.
>  // if that is so, we just return the node.
>  if (firstNodeNdx == secondNodeNdx) {
>   return healthyNodes.get(firstNodeNdx);
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11860) Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not remove chosen node from healthy list.

2017-05-22 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-11860:
--
Attachment: HDFS-11860-HDFS-7240.002.patch

> Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not 
> remove chosen node from healthy list.
> -
>
> Key: HDFS-11860
> URL: https://issues.apache.org/jira/browse/HDFS-11860
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-11860-HDFS-7240.001.patch, 
> HDFS-11860-HDFS-7240.002.patch
>
>
> This was caught in Jenkins run randomly. After debugging, found the cause is 
> the 
> logic when two random index happens to be the same below where the node id 
> was returned without being removed from the healthy list for next round of 
> selection. As a result, there could be duplicated datanodes chosen for the 
> pipeline and the machine list size smaller than expected. I will post a fix 
> soon. 
> {code}
> SCMContainerPlacementCapacity#chooseNode
>  // There is a possibility that both numbers will be same.
>  // if that is so, we just return the node.
>  if (firstNodeNdx == secondNodeNdx) {
>   return healthyNodes.get(firstNodeNdx);
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11860) Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not remove chosen node from healthy list.

2017-05-22 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019904#comment-16019904
 ] 

Xiaoyu Yao commented on HDFS-11860:
---

Thanks [~vagarychen] for the review. The test failure are not related to ozone 
changes. 
I've update the patch to fix the checkstyle (line > 80) issue. 

> Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not 
> remove chosen node from healthy list.
> -
>
> Key: HDFS-11860
> URL: https://issues.apache.org/jira/browse/HDFS-11860
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-11860-HDFS-7240.001.patch
>
>
> This was caught in Jenkins run randomly. After debugging, found the cause is 
> the 
> logic when two random index happens to be the same below where the node id 
> was returned without being removed from the healthy list for next round of 
> selection. As a result, there could be duplicated datanodes chosen for the 
> pipeline and the machine list size smaller than expected. I will post a fix 
> soon. 
> {code}
> SCMContainerPlacementCapacity#chooseNode
>  // There is a possibility that both numbers will be same.
>  // if that is so, we just return the node.
>  if (firstNodeNdx == secondNodeNdx) {
>   return healthyNodes.get(firstNodeNdx);
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11817) A faulty node can cause a lease leak and NPE on accessing data

2017-05-22 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019886#comment-16019886
 ] 

Kihwal Lee commented on HDFS-11817:
---

Two test failures are real.
{noformat}
TestRetryCacheWithHA#testCheckLease
TestLeaseManager#testCheckLease
{noformat}

{{TestLeaseManager}} was failing because I made failed lease recoveries to be 
retried. In the new patch, I made it give up and remove the lease if an 
IOException is thrown because of a bad path.  For all other cases, it is 
correct to retry, since they are likely transient conditions.

{{TestRetryCacheWithHA}} was failing because the test passed fake storage IDs 
to {{updatePipeline()}}, despite it has the real storage IDs available. Updated 
the test.

I will upload the trunk version shortly with these changes.

> A faulty node can cause a lease leak and NPE on accessing data
> --
>
> Key: HDFS-11817
> URL: https://issues.apache.org/jira/browse/HDFS-11817
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-11817.branch-2.patch, hdfs-11817_supplement.txt
>
>
> When the namenode performs a lease recovery for a failed write, the 
> {{commitBlockSynchronization()}} will fail, if none of the new target has 
> sent a received-IBR.  At this point, the data is inaccessible, as the 
> namenode will throw a {{NullPointerException}} upon {{getBlockLocations()}}.
> The lease recovery will be retried in about an hour by the namenode. If the 
> nodes are faulty (usually when there is only one new target), they may not 
> block report until this point. If this happens, lease recovery throws an 
> {{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove 
> the lease without  finalizing the inode.  
> This results in an inconsistent lease state. The inode stays 
> under-construction, but no more lease recovery is attempted. A manual lease 
> recovery is also not allowed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11863) Document missing metrics for blocks count in pending IBR

2017-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019876#comment-16019876
 ] 

Hadoop QA commented on HDFS-11863:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 16m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11863 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12869282/HDFS-11863.002.patch |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 533893e94615 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / b6f66b0 |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19542/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Document missing metrics for blocks count in pending IBR
> 
>
> Key: HDFS-11863
> URL: https://issues.apache.org/jira/browse/HDFS-11863
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.0.0-alpha3
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-11863.001.patch, HDFS-11863.002.patch
>
>
> HDFS-11534 introduced some useful metrics for blocks count in pending IBR.
> {noformat}
>   @Metric("Count of blocks in pending IBR")
>   private MutableGaugeLong blocksInPendingIBR;
>   @Metric("Count of blocks at receiving status in pending IBR")
>   private MutableGaugeLong blocksReceivingInPendingIBR;
>   @Metric("Count of blocks at received status in pending IBR")
>   private MutableGaugeLong blocksReceivedInPendingIBR;
> {noformat}
> It will be nice to have in the metrics documentation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11863) Document missing metrics for blocks count in pending IBR

2017-05-22 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019869#comment-16019869
 ] 

Hanisha Koneru commented on HDFS-11863:
---

Thanks [~linyiqun] for the patch. Patch v02 LGTM.

> Document missing metrics for blocks count in pending IBR
> 
>
> Key: HDFS-11863
> URL: https://issues.apache.org/jira/browse/HDFS-11863
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.0.0-alpha3
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-11863.001.patch, HDFS-11863.002.patch
>
>
> HDFS-11534 introduced some useful metrics for blocks count in pending IBR.
> {noformat}
>   @Metric("Count of blocks in pending IBR")
>   private MutableGaugeLong blocksInPendingIBR;
>   @Metric("Count of blocks at receiving status in pending IBR")
>   private MutableGaugeLong blocksReceivingInPendingIBR;
>   @Metric("Count of blocks at received status in pending IBR")
>   private MutableGaugeLong blocksReceivedInPendingIBR;
> {noformat}
> It will be nice to have in the metrics documentation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11850) Ozone: Stack Overflow in XceiverClientManager because of race condition in accessing openClient

2017-05-22 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-11850:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to the feature branch.

> Ozone: Stack Overflow in XceiverClientManager because of race condition in 
> accessing openClient
> ---
>
> Key: HDFS-11850
> URL: https://issues.apache.org/jira/browse/HDFS-11850
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Attachments: HDFS-11850-HDFS-7240.001.patch
>
>
> There is a possible race condition in accessing the open client has, it is 
> because of unlocked access of the hash in acquireClient.
> This can cause stack overflow and also leaking client in all probabilities
> {code}
> at 
> com.google.common.cache.LocalCache$Segment.put(LocalCache.java:3019)
> at com.google.common.cache.LocalCache.put(LocalCache.java:4365)
> at 
> com.google.common.cache.LocalCache$LocalManualCache.put(LocalCache.java:5077)
> at 
> org.apache.hadoop.scm.XceiverClientManager$1.onRemoval(XceiverClientManager.java:85)
> at 
> com.google.common.cache.LocalCache.processPendingNotifications(LocalCache.java:1966)
> at 
> com.google.common.cache.LocalCache$Segment.runUnlockedCleanup(LocalCache.java:3650)
> at 
> com.google.common.cache.LocalCache$Segment.postWriteCleanup(LocalCache.java:3626)
> at 
> com.google.common.cache.LocalCache$Segment.put(LocalCache.java:3019)
> at com.google.common.cache.LocalCache.put(LocalCache.java:4365)
> at 
> com.google.common.cache.LocalCache$LocalManualCache.put(LocalCache.java:5077)
> at 
> org.apache.hadoop.scm.XceiverClientManager$1.onRemoval(XceiverClientManager.java:85)
> at 
> com.google.common.cache.LocalCache.processPendingNotifications(LocalCache.java:1966)
> at 
> com.google.common.cache.LocalCache$Segment.runUnlockedCleanup(LocalCache.java:3650)
> at 
> com.google.common.cache.LocalCache$Segment.postWriteCleanup(LocalCache.java:3626)
> at 
> com.google.common.cache.LocalCache$Segment.put(LocalCache.java:3019)
> at com.google.common.cache.LocalCache.put(LocalCache.java:4365)
> at 
> com.google.common.cache.LocalCache$LocalManualCache.put(LocalCache.java:5077)
> at 
> org.apache.hadoop.scm.XceiverClientManager$1.onRemoval(XceiverClientManager.java:85)
> at 
> com.google.common.cache.LocalCache.processPendingNotifications(LocalCache.java:1966)
> at 
> com.google.common.cache.LocalCache$Segment.runUnlockedCleanup(LocalCache.java:3650)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11850) Ozone: Stack Overflow in XceiverClientManager because of race condition in accessing openClient

2017-05-22 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019839#comment-16019839
 ] 

Chen Liang commented on HDFS-11850:
---

Thanks [~msingh] for working on this! v001 patch LGTM, will commit this shortly.

> Ozone: Stack Overflow in XceiverClientManager because of race condition in 
> accessing openClient
> ---
>
> Key: HDFS-11850
> URL: https://issues.apache.org/jira/browse/HDFS-11850
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Attachments: HDFS-11850-HDFS-7240.001.patch
>
>
> There is a possible race condition in accessing the open client has, it is 
> because of unlocked access of the hash in acquireClient.
> This can cause stack overflow and also leaking client in all probabilities
> {code}
> at 
> com.google.common.cache.LocalCache$Segment.put(LocalCache.java:3019)
> at com.google.common.cache.LocalCache.put(LocalCache.java:4365)
> at 
> com.google.common.cache.LocalCache$LocalManualCache.put(LocalCache.java:5077)
> at 
> org.apache.hadoop.scm.XceiverClientManager$1.onRemoval(XceiverClientManager.java:85)
> at 
> com.google.common.cache.LocalCache.processPendingNotifications(LocalCache.java:1966)
> at 
> com.google.common.cache.LocalCache$Segment.runUnlockedCleanup(LocalCache.java:3650)
> at 
> com.google.common.cache.LocalCache$Segment.postWriteCleanup(LocalCache.java:3626)
> at 
> com.google.common.cache.LocalCache$Segment.put(LocalCache.java:3019)
> at com.google.common.cache.LocalCache.put(LocalCache.java:4365)
> at 
> com.google.common.cache.LocalCache$LocalManualCache.put(LocalCache.java:5077)
> at 
> org.apache.hadoop.scm.XceiverClientManager$1.onRemoval(XceiverClientManager.java:85)
> at 
> com.google.common.cache.LocalCache.processPendingNotifications(LocalCache.java:1966)
> at 
> com.google.common.cache.LocalCache$Segment.runUnlockedCleanup(LocalCache.java:3650)
> at 
> com.google.common.cache.LocalCache$Segment.postWriteCleanup(LocalCache.java:3626)
> at 
> com.google.common.cache.LocalCache$Segment.put(LocalCache.java:3019)
> at com.google.common.cache.LocalCache.put(LocalCache.java:4365)
> at 
> com.google.common.cache.LocalCache$LocalManualCache.put(LocalCache.java:5077)
> at 
> org.apache.hadoop.scm.XceiverClientManager$1.onRemoval(XceiverClientManager.java:85)
> at 
> com.google.common.cache.LocalCache.processPendingNotifications(LocalCache.java:1966)
> at 
> com.google.common.cache.LocalCache$Segment.runUnlockedCleanup(LocalCache.java:3650)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11860) Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not remove chosen node from healthy list.

2017-05-22 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019827#comment-16019827
 ] 

Chen Liang commented on HDFS-11860:
---

Thanks [~xyao] for the debugging and the fix! v001 patch LGTM with that 
checkstyle warning fixed. I haven't looked into the failed tests, seems could 
be potentially related though. Could you please verify that?

> Ozone: SCM: SCMContainerPlacementCapacity#chooseNode sometimes does not 
> remove chosen node from healthy list.
> -
>
> Key: HDFS-11860
> URL: https://issues.apache.org/jira/browse/HDFS-11860
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-11860-HDFS-7240.001.patch
>
>
> This was caught in Jenkins run randomly. After debugging, found the cause is 
> the 
> logic when two random index happens to be the same below where the node id 
> was returned without being removed from the healthy list for next round of 
> selection. As a result, there could be duplicated datanodes chosen for the 
> pipeline and the machine list size smaller than expected. I will post a fix 
> soon. 
> {code}
> SCMContainerPlacementCapacity#chooseNode
>  // There is a possibility that both numbers will be same.
>  // if that is so, we just return the node.
>  if (firstNodeNdx == secondNodeNdx) {
>   return healthyNodes.get(firstNodeNdx);
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-11539) Block Storage : configurable max cache size

2017-05-22 Thread Mukul Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh reassigned HDFS-11539:


Assignee: Mukul Kumar Singh

> Block Storage : configurable max cache size
> ---
>
> Key: HDFS-11539
> URL: https://issues.apache.org/jira/browse/HDFS-11539
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Chen Liang
>Assignee: Mukul Kumar Singh
>
> Currently, there is no max size limit for CBlock's local cache. In theory, 
> this means the cache can potentially increase unbounded. We should make the 
> max size configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11863) Document missing metrics for blocks count in pending IBR

2017-05-22 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019695#comment-16019695
 ] 

Yiqun Lin edited comment on HDFS-11863 at 5/22/17 3:20 PM:
---

Thanks [~brahmareddy] for the review and comments.
{quote}
i) Can we update before EC metrics (EcReconstructionBytesRead) like code..? 
{quote}
Done.
{quote}
ii) There are some more metrics are missed, may we can raise followup jira's 
for those ..? 
{quote}
Agreed on this. I have assigned HDFS-11864 to myself. BTW, do you have found 
any other missing metrics that needed to be documented across hdfs project, 
[~brahmareddy]?

Attach the updated patch. Please have a review. Thanks.


was (Author: linyiqun):
Thanks [~brahmareddy] for the review and comments.
{quote}
i) Can we update before EC metrics (EcReconstructionBytesRead) like code..? 
{quote}
Done.
{quote}
ii) There are some more metrics are missed, may we can raise followup jira's 
for those ..? 
{quote}
Agreed on this. I have assined HDFS-11864 to myself. BTW, do you have found any 
other missing metrics that needed to be documented [~brahmareddy]?

Attach the updated patch. Please have a review. Thanks.

> Document missing metrics for blocks count in pending IBR
> 
>
> Key: HDFS-11863
> URL: https://issues.apache.org/jira/browse/HDFS-11863
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.0.0-alpha3
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-11863.001.patch, HDFS-11863.002.patch
>
>
> HDFS-11534 introduced some useful metrics for blocks count in pending IBR.
> {noformat}
>   @Metric("Count of blocks in pending IBR")
>   private MutableGaugeLong blocksInPendingIBR;
>   @Metric("Count of blocks at receiving status in pending IBR")
>   private MutableGaugeLong blocksReceivingInPendingIBR;
>   @Metric("Count of blocks at received status in pending IBR")
>   private MutableGaugeLong blocksReceivedInPendingIBR;
> {noformat}
> It will be nice to have in the metrics documentation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11863) Document missing metrics for blocks count in pending IBR

2017-05-22 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11863:
-
Attachment: HDFS-11863.002.patch

Thanks [~brahmareddy] for the review and comments.
{quote}
i) Can we update before EC metrics (EcReconstructionBytesRead) like code..? 
{quote}
Done.
{quote}
ii) There are some more metrics are missed, may we can raise followup jira's 
for those ..? 
{quote}
Agreed on this. I have assined HDFS-11864 to myself. BTW, do you have found any 
other missing metrics that needed to be documented [~brahmareddy]?

Attach the updated patch. Please have a review. Thanks.

> Document missing metrics for blocks count in pending IBR
> 
>
> Key: HDFS-11863
> URL: https://issues.apache.org/jira/browse/HDFS-11863
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.0.0-alpha3
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-11863.001.patch, HDFS-11863.002.patch
>
>
> HDFS-11534 introduced some useful metrics for blocks count in pending IBR.
> {noformat}
>   @Metric("Count of blocks in pending IBR")
>   private MutableGaugeLong blocksInPendingIBR;
>   @Metric("Count of blocks at receiving status in pending IBR")
>   private MutableGaugeLong blocksReceivingInPendingIBR;
>   @Metric("Count of blocks at received status in pending IBR")
>   private MutableGaugeLong blocksReceivedInPendingIBR;
> {noformat}
> It will be nice to have in the metrics documentation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-11864) Document Metrics to track usage of memory for writes

2017-05-22 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin reassigned HDFS-11864:


Assignee: Yiqun Lin

> Document  Metrics to track usage of memory for writes 
> --
>
> Key: HDFS-11864
> URL: https://issues.apache.org/jira/browse/HDFS-11864
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Brahma Reddy Battula
>Assignee: Yiqun Lin
>
> HDFS-7129 introduced followings metrics which are not documented.
> {noformat}
> // RamDisk metrics on read/write
> @Metric MutableCounterLong ramDiskBlocksWrite;
> @Metric MutableCounterLong ramDiskBlocksWriteFallback;
> @Metric MutableCounterLong ramDiskBytesWrite;
> @Metric MutableCounterLong ramDiskBlocksReadHits;
>   
> // RamDisk metrics on eviction
> @Metric MutableCounterLong ramDiskBlocksEvicted;
> @Metric MutableCounterLong ramDiskBlocksEvictedWithoutRead;
> @Metric MutableRateramDiskBlocksEvictionWindowMs;
> final MutableQuantiles[]   ramDiskBlocksEvictionWindowMsQuantiles;
>   
>   
> // RamDisk metrics on lazy persist
> @Metric MutableCounterLong ramDiskBlocksLazyPersisted;
> @Metric MutableCounterLong ramDiskBlocksDeletedBeforeLazyPersisted;
> @Metric MutableCounterLong ramDiskBytesLazyPersisted;
> @Metric MutableRateramDiskBlocksLazyPersistWindowMs;
> final MutableQuantiles[]   ramDiskBlocksLazyPersistWindowMsQuantiles;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11778) Ozone: KSM: add getBucketInfo

2017-05-22 Thread Nandakumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandakumar updated HDFS-11778:
--
Attachment: HDFS-11778-HDFS-7240.000.patch

> Ozone: KSM: add getBucketInfo
> -
>
> Key: HDFS-11778
> URL: https://issues.apache.org/jira/browse/HDFS-11778
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Nandakumar
> Attachments: HDFS-11778-HDFS-7240.000.patch
>
>
> Returns the bucket information if the bucket exists.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >