date:20171120

[jira] [Updated] (HDFS-12840) Creating a replicated file in a EC zone does not correctly serialized in EditLogs

2017-11-20 Thread Lei (Eddy) Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-12840:
-
Labels: hdfs-ec-3.0-must-do  (was: )

> Creating a replicated file in a EC zone does not correctly serialized in 
> EditLogs
> -
>
> Key: HDFS-12840
> URL: https://issues.apache.org/jira/browse/HDFS-12840
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Blocker
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12840.00.patch, HDFS-12840.reprod.patch
>
>
> When create a replicated file in an existing EC zone, the edit logs does not 
> differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, 
> this file is treated as EC file, as a results, it crashes the NN because the 
> blocks of this file are replicated, which does not match with {{INode}}.
> {noformat}
> ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered 
> exception on operation AddBlockOp [path=/system/balancer.id, 
> penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, 
> RpcCallId=-2]
> java.lang.IllegalArgumentException: reportedBlock is not striped
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12840) Creating a replicated file in a EC zone does not correctly serialized in EditLogs

2017-11-20 Thread Lei (Eddy) Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-12840:
-
Status: Patch Available  (was: Open)

> Creating a replicated file in a EC zone does not correctly serialized in 
> EditLogs
> -
>
> Key: HDFS-12840
> URL: https://issues.apache.org/jira/browse/HDFS-12840
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Blocker
> Attachments: HDFS-12840.00.patch, HDFS-12840.reprod.patch
>
>
> When create a replicated file in an existing EC zone, the edit logs does not 
> differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, 
> this file is treated as EC file, as a results, it crashes the NN because the 
> blocks of this file are replicated, which does not match with {{INode}}.
> {noformat}
> ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered 
> exception on operation AddBlockOp [path=/system/balancer.id, 
> penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, 
> RpcCallId=-2]
> java.lang.IllegalArgumentException: reportedBlock is not striped
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12681) Make HdfsLocatedFileStatus a subtype of LocatedFileStatus

2017-11-20 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HDFS-12681:
-
Summary: Make HdfsLocatedFileStatus a subtype of LocatedFileStatus  (was: 
Fold HdfsLocatedFileStatus into HdfsFileStatus)

> Make HdfsLocatedFileStatus a subtype of LocatedFileStatus
> -
>
> Key: HDFS-12681
> URL: https://issues.apache.org/jira/browse/HDFS-12681
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chris Douglas
>Assignee: Chris Douglas
>Priority: Minor
> Attachments: HDFS-12681.00.patch, HDFS-12681.01.patch, 
> HDFS-12681.02.patch, HDFS-12681.03.patch, HDFS-12681.04.patch, 
> HDFS-12681.05.patch, HDFS-12681.06.patch, HDFS-12681.07.patch, 
> HDFS-12681.08.patch, HDFS-12681.09.patch, HDFS-12681.10.patch, 
> HDFS-12681.11.patch, HDFS-12681.12.patch, HDFS-12681.13.patch
>
>
> {{HdfsLocatedFileStatus}} is a subtype of {{HdfsFileStatus}}, but not of 
> {{LocatedFileStatus}}. Conversion requires copying common fields and shedding 
> unknown data. It would be cleaner and sufficient for {{HdfsFileStatus}} to 
> extend {{LocatedFileStatus}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12681) Fold HdfsLocatedFileStatus into HdfsFileStatus

2017-11-20 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HDFS-12681:
-
Attachment: HDFS-12681.13.patch

Failing tests are due to resource exhaustion.

Updated patch to fix some checkstyle, put the findbugs suppression in the 
correct file.

> Fold HdfsLocatedFileStatus into HdfsFileStatus
> --
>
> Key: HDFS-12681
> URL: https://issues.apache.org/jira/browse/HDFS-12681
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chris Douglas
>Assignee: Chris Douglas
>Priority: Minor
> Attachments: HDFS-12681.00.patch, HDFS-12681.01.patch, 
> HDFS-12681.02.patch, HDFS-12681.03.patch, HDFS-12681.04.patch, 
> HDFS-12681.05.patch, HDFS-12681.06.patch, HDFS-12681.07.patch, 
> HDFS-12681.08.patch, HDFS-12681.09.patch, HDFS-12681.10.patch, 
> HDFS-12681.11.patch, HDFS-12681.12.patch, HDFS-12681.13.patch
>
>
> {{HdfsLocatedFileStatus}} is a subtype of {{HdfsFileStatus}}, but not of 
> {{LocatedFileStatus}}. Conversion requires copying common fields and shedding 
> unknown data. It would be cleaner and sufficient for {{HdfsFileStatus}} to 
> extend {{LocatedFileStatus}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12740) SCM should support a RPC to share the cluster Id with KSM and DataNodes

2017-11-20 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-12740:
---
Attachment: HDFS-12740-HDFS-7240.005.patch

> SCM should support a RPC to share the cluster Id with KSM and DataNodes
> ---
>
> Key: HDFS-12740
> URL: https://issues.apache.org/jira/browse/HDFS-12740
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
> Fix For: HDFS-7240
>
> Attachments: HDFS-12740-HDFS-7240.001.patch, 
> HDFS-12740-HDFS-7240.002.patch, HDFS-12740-HDFS-7240.003.patch, 
> HDFS-12740-HDFS-7240.004.patch, HDFS-12740-HDFS-7240.005.patch
>
>
> When the ozone cluster is first Created, SCM --init command will generate 
> cluster Id as well as SCM Id and persist it locally. The same cluster Id and 
> the SCM id will be shared with KSM during the KSM initialization and 
> Datanodes during datanode registration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12740) SCM should support a RPC to share the cluster Id with KSM and DataNodes

2017-11-20 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-12740:
---
Attachment: (was: HDFS-12740-HDFS-7240.005.patch)

> SCM should support a RPC to share the cluster Id with KSM and DataNodes
> ---
>
> Key: HDFS-12740
> URL: https://issues.apache.org/jira/browse/HDFS-12740
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
> Fix For: HDFS-7240
>
> Attachments: HDFS-12740-HDFS-7240.001.patch, 
> HDFS-12740-HDFS-7240.002.patch, HDFS-12740-HDFS-7240.003.patch, 
> HDFS-12740-HDFS-7240.004.patch
>
>
> When the ozone cluster is first Created, SCM --init command will generate 
> cluster Id as well as SCM Id and persist it locally. The same cluster Id and 
> the SCM id will be shared with KSM during the KSM initialization and 
> Datanodes during datanode registration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12740) SCM should support a RPC to share the cluster Id with KSM and DataNodes

2017-11-20 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-12740:
---
Attachment: HDFS-12740-HDFS-7240.005.patch

Thanks [~nandakumar131] for the review comments.Patch v5 addresses the review 
comments.Please have a look.

> SCM should support a RPC to share the cluster Id with KSM and DataNodes
> ---
>
> Key: HDFS-12740
> URL: https://issues.apache.org/jira/browse/HDFS-12740
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
> Fix For: HDFS-7240
>
> Attachments: HDFS-12740-HDFS-7240.001.patch, 
> HDFS-12740-HDFS-7240.002.patch, HDFS-12740-HDFS-7240.003.patch, 
> HDFS-12740-HDFS-7240.004.patch, HDFS-12740-HDFS-7240.005.patch
>
>
> When the ozone cluster is first Created, SCM --init command will generate 
> cluster Id as well as SCM Id and persist it locally. The same cluster Id and 
> the SCM id will be shared with KSM during the KSM initialization and 
> Datanodes during datanode registration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12838) Ozone: Optimize number of allocated block rpc by aggregating multiple block allocation requests

2017-11-20 Thread Mukul Kumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-12838:
-
Status: Patch Available  (was: Open)

> Ozone: Optimize number of allocated block rpc by aggregating multiple block 
> allocation requests
> ---
>
> Key: HDFS-12838
> URL: https://issues.apache.org/jira/browse/HDFS-12838
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Fix For: HDFS-7240
>
> Attachments: HDFS-12838-HDFS-7240.001.patch
>
>
> Currently KeySpaceManager allocates multiple blocks by sending multiple block 
> allocation requests over the RPC. This can be optimized to aggregate multiple 
> block allocation request over one rpc.
> {code}
>   while (requestedSize > 0) {
> long allocateSize = Math.min(scmBlockSize, requestedSize);
> AllocatedBlock allocatedBlock =
> scmBlockClient.allocateBlock(allocateSize, type, factor);
> KsmKeyLocationInfo subKeyInfo = new KsmKeyLocationInfo.Builder()
> .setContainerName(allocatedBlock.getPipeline().getContainerName())
> .setBlockID(allocatedBlock.getKey())
> .setShouldCreateContainer(allocatedBlock.getCreateContainer())
> .setIndex(idx++)
> .setLength(allocateSize)
> .setOffset(0)
> .build();
> locations.add(subKeyInfo);
> requestedSize -= allocateSize;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12832) INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to NameNode exit

2017-11-20 Thread DENG FEI (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260198#comment-16260198
 ] 

DENG FEI commented on HDFS-12832:
-

[~xkrogen]  
Has been uploaded on the stack,  it happened at 
{{ReplicationWork#chooseTargets()}} indeed.
And you are right. {{INode#getPathComponents()}} has same problem when 
concurrently do {{move}} but not only {{rename}}.

> INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to 
> NameNode exit
> 
>
> Key: HDFS-12832
> URL: https://issues.apache.org/jira/browse/HDFS-12832
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.4, 3.0.0-beta1
>Reporter: DENG FEI
>Priority: Critical
> Attachments: HDFS-12832-trunk-001.patch, exception.log
>
>
> {code:title=INode.java|borderStyle=solid}
> public String getFullPathName() {
> // Get the full path name of this inode.
> if (isRoot()) {
>   return Path.SEPARATOR;
> }
> // compute size of needed bytes for the path
> int idx = 0;
> for (INode inode = this; inode != null; inode = inode.getParent()) {
>   // add component + delimiter (if not tail component)
>   idx += inode.getLocalNameBytes().length + (inode != this ? 1 : 0);
> }
> byte[] path = new byte[idx];
> for (INode inode = this; inode != null; inode = inode.getParent()) {
>   if (inode != this) {
> path[--idx] = Path.SEPARATOR_CHAR;
>   }
>   byte[] name = inode.getLocalNameBytes();
>   idx -= name.length;
>   System.arraycopy(name, 0, path, idx, name.length);
> }
> return DFSUtil.bytes2String(path);
>   }
> {code}
> We found ArrayIndexOutOfBoundsException at 
> _{color:#707070}System.arraycopy(name, 0, path, idx, name.length){color}_ 
> when ReplicaMonitor work ,and the NameNode will quit.
> It seems the two loop is not synchronized, the path's length is changed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12832) INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to NameNode exit

2017-11-20 Thread DENG FEI (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DENG FEI updated HDFS-12832:

Attachment: exception.log

> INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to 
> NameNode exit
> 
>
> Key: HDFS-12832
> URL: https://issues.apache.org/jira/browse/HDFS-12832
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.4, 3.0.0-beta1
>Reporter: DENG FEI
>Priority: Critical
> Attachments: HDFS-12832-trunk-001.patch, exception.log
>
>
> {code:title=INode.java|borderStyle=solid}
> public String getFullPathName() {
> // Get the full path name of this inode.
> if (isRoot()) {
>   return Path.SEPARATOR;
> }
> // compute size of needed bytes for the path
> int idx = 0;
> for (INode inode = this; inode != null; inode = inode.getParent()) {
>   // add component + delimiter (if not tail component)
>   idx += inode.getLocalNameBytes().length + (inode != this ? 1 : 0);
> }
> byte[] path = new byte[idx];
> for (INode inode = this; inode != null; inode = inode.getParent()) {
>   if (inode != this) {
> path[--idx] = Path.SEPARATOR_CHAR;
>   }
>   byte[] name = inode.getLocalNameBytes();
>   idx -= name.length;
>   System.arraycopy(name, 0, path, idx, name.length);
> }
> return DFSUtil.bytes2String(path);
>   }
> {code}
> We found ArrayIndexOutOfBoundsException at 
> _{color:#707070}System.arraycopy(name, 0, path, idx, name.length){color}_ 
> when ReplicaMonitor work ,and the NameNode will quit.
> It seems the two loop is not synchronized, the path's length is changed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-12638) NameNode exits due to ReplicationMonitor thread received Runtime exception in ReplicationWork#chooseTargets

2017-11-20 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256293#comment-16256293
 ] 

Konstantin Shvachko edited comment on HDFS-12638 at 11/21/17 2:09 AM:
--

I think it's a blocker for all branches 2.8 and up. Even just removing that 
line {{toDelete.delete();}} would prevent crashing NameNode.
Or reverting HDFS-9754 should also help.


was (Author: shv):
I think it's a blocker for all branches 2.8 and up. Even just removing that 
line {{toDelete.delete();}} would prevent crashing NameNode.

> NameNode exits due to ReplicationMonitor thread received Runtime exception in 
> ReplicationWork#chooseTargets
> ---
>
> Key: HDFS-12638
> URL: https://issues.apache.org/jira/browse/HDFS-12638
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Jiandan Yang 
>Priority: Blocker
> Attachments: HDFS-12638-branch-2.8.2.001.patch, HDFS-12638.002.patch, 
> OphanBlocksAfterTruncateDelete.jpg
>
>
> Active NamNode exit due to NPE, I can confirm that the BlockCollection passed 
> in when creating ReplicationWork is null, but I do not know why 
> BlockCollection is null, By view history I found 
> [HDFS-9754|https://issues.apache.org/jira/browse/HDFS-9754] remove judging  
> whether  BlockCollection is null.
> NN logs are as following:
> {code:java}
> 2017-10-11 16:29:06,161 ERROR [ReplicationMonitor] 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> ReplicationMonitor thread received Runtime exception.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:55)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1532)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1491)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3792)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3744)
> at java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12813) RequestHedgingProxyProvider can hide Exception thrown from the Namenode for proxy size of 1

2017-11-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260134#comment-16260134
 ] 

Hudson commented on HDFS-12813:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13262 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13262/])
HDFS-12813.  RequestHedgingProxyProvider can hide Exception thrown from 
(szetszwo: rev 659e85e304d070f9908a96cf6a0e1cbafde6a434)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRequestHedgingProxyProvider.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java


> RequestHedgingProxyProvider can hide Exception thrown from the Namenode for 
> proxy size of 1
> ---
>
> Key: HDFS-12813
> URL: https://issues.apache.org/jira/browse/HDFS-12813
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Fix For: 3.0.0, 2.10.0
>
> Attachments: HDFS-12813.001.patch, HDFS-12813.002.patch, 
> HDFS-12813.003.patch, HDFS-12813.004.patch
>
>
> HDFS-11395 fixed the problem where the MultiException thrown by 
> RequestHedgingProxyProvider was hidden. However when the target proxy size is 
> 1, then unwrapping is not done for the InvocationTargetException. for target 
> proxy size of 1, the unwrapping should be done till first level where as for 
> multiple proxy size, it should be done at 2 levels.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12813) RequestHedgingProxyProvider can hide Exception thrown from the Namenode for proxy size of 1

2017-11-20 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-12813:
---
   Resolution: Fixed
Fix Version/s: 2.10.0
   3.0.0
   Status: Resolved  (was: Patch Available)

I have committed this.  Thanks, Mukul!

> RequestHedgingProxyProvider can hide Exception thrown from the Namenode for 
> proxy size of 1
> ---
>
> Key: HDFS-12813
> URL: https://issues.apache.org/jira/browse/HDFS-12813
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Fix For: 3.0.0, 2.10.0
>
> Attachments: HDFS-12813.001.patch, HDFS-12813.002.patch, 
> HDFS-12813.003.patch, HDFS-12813.004.patch
>
>
> HDFS-11395 fixed the problem where the MultiException thrown by 
> RequestHedgingProxyProvider was hidden. However when the target proxy size is 
> 1, then unwrapping is not done for the InvocationTargetException. for target 
> proxy size of 1, the unwrapping should be done till first level where as for 
> multiple proxy size, it should be done at 2 levels.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10183) Prevent race condition during class initialization

2017-11-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260111#comment-16260111
 ] 

Hadoop QA commented on HDFS-10183:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  9m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 34s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 28s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}142m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.fs.TestUnbuffer |
|   | hadoop.hdfs.server.namenode.ha.TestHASafeMode |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-10183 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12794395/HDFS-10183.2.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ace64cb573b6 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 60fc2a1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22146/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22146/testReport/ |
| Max. process+thread count | 4159 (vs. ulimit of 5000) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console

[jira] [Updated] (HDFS-12840) Creating a replicated file in a EC zone does not correctly serialized in EditLogs

2017-11-20 Thread Lei (Eddy) Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-12840:
-
Attachment: HDFS-12840.reprod.patch

Attach the patch to reproduce this bug.

Will post the fix soon.

> Creating a replicated file in a EC zone does not correctly serialized in 
> EditLogs
> -
>
> Key: HDFS-12840
> URL: https://issues.apache.org/jira/browse/HDFS-12840
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Blocker
> Attachments: HDFS-12840.reprod.patch
>
>
> When create a replicated file in an existing EC zone, the edit logs does not 
> differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, 
> this file is treated as EC file, as a results, it crashes the NN because the 
> blocks of this file are replicated, which does not match with {{INode}}.
> {noformat}
> ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered 
> exception on operation AddBlockOp [path=/system/balancer.id, 
> penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, 
> RpcCallId=-2]
> java.lang.IllegalArgumentException: reportedBlock is not striped
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-12840) Creating a replicated file in a EC zone does not correctly serialized in EditLogs

2017-11-20 Thread Lei (Eddy) Xu (JIRA)

Lei (Eddy) Xu created HDFS-12840:


 Summary: Creating a replicated file in a EC zone does not 
correctly serialized in EditLogs
 Key: HDFS-12840
 URL: https://issues.apache.org/jira/browse/HDFS-12840
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: erasure-coding
Affects Versions: 3.0.0-beta1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Blocker


When create a replicated file in an existing EC zone, the edit logs does not 
differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, 
this file is treated as EC file, as a results, it crashes the NN because the 
blocks of this file are replicated, which does not match with {{INode}}.

{noformat}
ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered 
exception on operation AddBlockOp [path=/system/balancer.id, 
penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, 
RpcCallId=-2]
java.lang.IllegalArgumentException: reportedBlock is not striped
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118)
at 
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12347) TestBalancerRPCDelay#testBalancerRPCDelay fails very frequently

2017-11-20 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12347:
-
Summary: TestBalancerRPCDelay#testBalancerRPCDelay fails very frequently  
(was: TestBalancerRPCDelay#testBalancerRPCDelay fails consistently)

> TestBalancerRPCDelay#testBalancerRPCDelay fails very frequently
> ---
>
> Key: HDFS-12347
> URL: https://issues.apache.org/jira/browse/HDFS-12347
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0-beta1, 2.7.5, 3.0.1
>Reporter: Xiao Chen
>Assignee: Bharat Viswanadham
>Priority: Critical
> Attachments: trunk.failed.xml
>
>
> Seems to be failing consistently on trunk from yesterday-ish.
> A sample failure is 
> https://builds.apache.org/job/PreCommit-HDFS-Build/20824/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerRPCDelay/testBalancerRPCDelay/
> Running locally failed with:
> {noformat}
>  type="java.lang.AssertionError">
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-20 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260017#comment-16260017
 ] 

Anu Engineer commented on HDFS-7240:


h1. Ozone - First community meeting
{{Time: Thursday, November 16, 2017, at 1:00:00 am PST}}
_Participants:  Anu Engineer, Mukul Kumar Singh, Nandakumar Vadivelu, Weiwei 
Yang, Steve Loughran, Thomas Demoor, Shashikant Banerjee, Lokesh Jain_

We discussed quite a large number of technical issues at this meeting.

We went over how Ozone's works, the Namespace architecture of KSM and how it 
interacts with SCM. We traced both a write I/O path and read I/O path.

There was some discussion over the REST protocol and making sure that Rest 
protocol is good enough to support Hadoop based workloads. We look at various 
REST APIs of Ozone and also discussed O3 FS working over RPC instead of REST 
protocol. This is a work in progress.

Steve Loughran suggested that we add Storm to the applications that are tested 
against Ozone. Currently, we use Hive, Spark, YARN, as the applications to test 
against Ozone. We will add Storm to this testing mix.

We discussed performance and scale of testing; ozone has been tested with 
millions of keys. We have also tested with cluster sizes up to 300 nodes.

Steve suggested that we upgrade the Ratis version and lock that down before the 
merge.

Thomas Demoor pointed out the difference between the commit ordering of S3 and 
Ozone. Ozone uses the actual commit time to decide the key ordering, S3 uses 
the key creation time to decide the ordering of the keys. He also mentioned 
that this should not matter in the real world as he is not aware hard-coded 
dependency on commit ordering.



> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-12839) Refactor ratis-server tests to reduce the use DEFAULT_CALLID

2017-11-20 Thread Tsz Wo Nicholas Sze (JIRA)

Tsz Wo Nicholas Sze created HDFS-12839:
--

 Summary: Refactor ratis-server tests to reduce the use 
DEFAULT_CALLID
 Key: HDFS-12839
 URL: https://issues.apache.org/jira/browse/HDFS-12839
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


This JIRA is to help reducing the patch size in RATIS-141.

We refactor the tests so that DEFAULT_CALLID is only used in MiniRaftCluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12641) Backport HDFS-11755 into branch-2.7 to fix a regression in HDFS-11445

2017-11-20 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260010#comment-16260010
 ] 

Wei-Chiu Chuang commented on HDFS-12641:


Failed tests are due to jenkins OOM, doesn't appear related. Whitespace 
warnings are due to a file unrelated to the patch.
The checkstyle warning needs to be address though.

> Backport HDFS-11755 into branch-2.7 to fix a regression in HDFS-11445
> -
>
> Key: HDFS-12641
> URL: https://issues.apache.org/jira/browse/HDFS-12641
> Project: Hadoop HDFS
>  Issue Type: Task
>Affects Versions: 2.7.4
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-12641.branch-2.7.001.patch
>
>
> Our internal testing caught a regression in HDFS-11445 when we cherry picked 
> the commit into CDH. Basically, it produces bogus missing file warnings. 
> Further analysis revealed that the regression is actually fixed by HDFS-11755.
> Because of the order commits are merged in branch-2.8 ~ trunk (HDFS-11755 was 
> committed before HDFS-11445), the regression was never actually surfaced for 
> Hadoop 2.8/3.0.0-(alpha/beta) users. Since branch-2.7 has HDFS-11445 but no 
> HDFS-11755, I suspect the regression is more visible for Hadoop 2.7.4.
> I am filing this jira to raise more awareness, than simply backporting 
> HDFS-11755 into branch-2.7.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12641) Backport HDFS-11755 into branch-2.7 to fix a regression in HDFS-11445

2017-11-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259984#comment-16259984
 ] 

Hadoop QA commented on HDFS-12641:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m  
5s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.7 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
11s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
18s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
56s{color} | {color:green} branch-2.7 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 38s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 653 unchanged - 1 fixed = 655 total (was 654) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 60 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 47s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  1m 
10s{color} | {color:red} The patch generated 284 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}122m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Unreaped Processes | hadoop-hdfs:17 |
| Failed junit tests | hadoop.hdfs.TestBlocksScheduledCounter |
|   | hadoop.hdfs.TestWriteConfigurationToDFS |
|   | hadoop.hdfs.TestSetTimes |
|   | hadoop.hdfs.TestDFSRollback |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration |
|   | hadoop.hdfs.TestMiniDFSCluster |
|   | hadoop.hdfs.TestBalancerBandwidth |
|   | hadoop.hdfs.TestDFSClientRetries |
| Timed out junit tests | org.apache.hadoop.hdfs.TestDatanodeRegistration |
|   | org.apache.hadoop.hdfs.TestDFSClientFailover |
|   | org.apache.hadoop.hdfs.web.TestWebHdfsTokens |
|   | org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream |
|   | org.apache.hadoop.hdfs.TestFileAppendRestart |
|   | org.apache.hadoop.hdfs.TestSeekBug |
|   | org.apache.hadoop.hdfs.TestDFSMkdirs |
|   | org.apache.hadoop.hdfs.TestDatanodeReport |
|   | org.apache.hadoop.hdfs.web.TestWebHDFS |
|   | org.apache.hadoop.hdfs.web.TestWebHDFSXAttr |
|   | org.apache.hadoop.hdfs.TestDistributedFileSystem |
|   | org.apache.hadoop.hdfs.web.TestWebHDFSForHA |
|   | org.apache.hadoop.hdfs.TestDFSShell |
|   | org.apache.hadoop.hdfs.web.TestWebHDFSAcl |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:67e87c9 |
| JIRA Issue | HDFS-12641 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12892157/HDFS-12641.branch-2.7.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit

[jira] [Commented] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container

2017-11-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259970#comment-16259970
 ] 

Hadoop QA commented on HDFS-12794:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
 0s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
46s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
49s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
0s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
57s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 43s{color} | {color:orange} hadoop-hdfs-project: The patch generated 3 new + 
1 unchanged - 0 fixed = 4 total (was 1) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
39s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}104m  2s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
53s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}172m 50s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Unreaped Processes | hadoop-hdfs:6 |
| Failed junit tests | hadoop.ozone.web.client.TestKeysRatis |
|   | hadoop.ozone.ozShell.TestOzoneShell |
|   | hadoop.hdfs.server.namenode.TestFSEditLogLoader |
|   | hadoop.ozone.TestOzoneConfigurationFields |
|   | hadoop.ozone.container.ozoneimpl.TestOzoneContainer |
|   | hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion |
|   | hadoop.hdfs.server.namenode.TestFavoredNodesEndToEnd |
|   | hadoop.hdfs.server.namenode.TestDefaultBlockPlacementPolicy |
|   | hadoop.ozone.container.common.impl.TestContainerPersistence |
|

[jira] [Updated] (HDFS-10183) Prevent race condition during class initialization

2017-11-20 Thread Subru Krishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated HDFS-10183:
--
Fix Version/s: (was: 2.9.0)
   2.9.1

> Prevent race condition during class initialization
> --
>
> Key: HDFS-10183
> URL: https://issues.apache.org/jira/browse/HDFS-10183
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.9.0
>Reporter: Pavel Avgustinov
>Assignee: Pavel Avgustinov
>Priority: Minor
> Fix For: 2.9.1
>
> Attachments: HADOOP-12944.1.patch, HDFS-10183.2.patch
>
>
> In HADOOP-11969, [~busbey] tracked down a non-deterministic 
> {{NullPointerException}} to an oddity in the Java memory model: When multiple 
> threads trigger the loading of a class at the same time, one of them wins and 
> creates the {{java.lang.Class}} instance; the others block during this 
> initialization, but once it is complete they may obtain a reference to the 
> {{Class}} which has non-{{final}} fields still containing their default (i.e. 
> {{null}}) values. This leads to runtime failures that are hard to debug or 
> diagnose.
> HADOOP-11969 observed that {{ThreadLocal}} fields, by their very nature, are 
> very likely to be accessed from multiple threads, and thus the problem is 
> particularly severe there. Consequently, the patch removed all occurrences of 
> the issue in the code base.
> Unfortunately, since then HDFS-7964 has [reverted one of the fixes during a 
> refactoring|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-0c2e9f7f9e685f38d1a11373b627cfa6R151],
>  and introduced a [new instance of the 
> problem|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-6334d0df7d9aefbccd12b21bb7603169R43].
> The attached patch addresses the issue by adding the missing {{final}} 
> modifier in these two cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12836) startTxId could be greater than endTxId when tailing in-progress edit log

2017-11-20 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259924#comment-16259924
 ] 

Wei-Chiu Chuang commented on HDFS-12836:


Thanks. From a supportability point of view that definitely worths an 
improvement because the error message is obscure.

> startTxId could be greater than endTxId when tailing in-progress edit log
> -
>
> Key: HDFS-12836
> URL: https://issues.apache.org/jira/browse/HDFS-12836
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.0.0-alpha1
>Reporter: Chao Sun
>Assignee: Chao Sun
>
> When {{dfs.ha.tail-edits.in-progress}} is true, edit log tailer will also 
> tail those in progress edit log segments. However, in the following code:
> {code}
> if (onlyDurableTxns && inProgressOk) {
>   endTxId = Math.min(endTxId, committedTxnId);
> }
> EditLogInputStream elis = EditLogFileInputStream.fromUrl(
> connectionFactory, url, remoteLog.getStartTxId(),
> endTxId, remoteLog.isInProgress());
> {code}
> it is possible that {{remoteLog.getStartTxId()}} could be greater than 
> {{endTxId}}, and therefore will cause the following error:
> {code}
> 2017-11-17 19:55:41,165 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Error replaying edit log at offset 1048576.  Expected transaction ID was 87
> Recent opcode offsets: 1048576
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException:
>  got premature end-of-file at txid 86; expected file to go up to 85
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:197)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:189)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:205)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
> 2017-11-17 19:55:41,165 WARN 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Error while reading 
> edits from disk. Will try again.
> org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying 
> edit log at offset 1048576.  Expected transaction ID was 87
> Recent opcode offsets: 1048576
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:218)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
> Caused by: 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException:
>  got premature end-of-file at txid 86; expected file to go up to 85
> at 
>

[jira] [Commented] (HDFS-12836) startTxId could be greater than endTxId when tailing in-progress edit log

2017-11-20 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259891#comment-16259891
 ] 

Chao Sun commented on HDFS-12836:
-

This issue happens when not all NNs in the cluster enable the in-progress 
tailing. In this case, the NNs without in-progress tailing enabled will not 
update the commit ID and cause it to be one less than the start/end txn ID. The 
commit update is handled by [this code | 
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumOutputStream.java#L118].

A simple fix should just update {{endTxId}} to be the maximum between 
{{endTxId}} and {{remoteLog.getStartTxId()}}. However, I'm not sure if this is 
a valid issue since I assume people should always use the same configuration 
for all NNs.

> startTxId could be greater than endTxId when tailing in-progress edit log
> -
>
> Key: HDFS-12836
> URL: https://issues.apache.org/jira/browse/HDFS-12836
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.0.0-alpha1
>Reporter: Chao Sun
>Assignee: Chao Sun
>
> When {{dfs.ha.tail-edits.in-progress}} is true, edit log tailer will also 
> tail those in progress edit log segments. However, in the following code:
> {code}
> if (onlyDurableTxns && inProgressOk) {
>   endTxId = Math.min(endTxId, committedTxnId);
> }
> EditLogInputStream elis = EditLogFileInputStream.fromUrl(
> connectionFactory, url, remoteLog.getStartTxId(),
> endTxId, remoteLog.isInProgress());
> {code}
> it is possible that {{remoteLog.getStartTxId()}} could be greater than 
> {{endTxId}}, and therefore will cause the following error:
> {code}
> 2017-11-17 19:55:41,165 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Error replaying edit log at offset 1048576.  Expected transaction ID was 87
> Recent opcode offsets: 1048576
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException:
>  got premature end-of-file at txid 86; expected file to go up to 85
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:197)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:189)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:205)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
> 2017-11-17 19:55:41,165 WARN 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Error while reading 
> edits from disk. Will try again.
> org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying 
> edit log at offset 1048576.  Expected transaction ID was 87
> Recent opcode offsets: 1048576
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:218)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
>

[jira] [Commented] (HDFS-12832) INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to NameNode exit

2017-11-20 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259877#comment-16259877
 ] 

Konstantin Shvachko commented on HDFS-12832:


[~Deng FEI] could you please post the actual exception here if you have it. It 
would be good to see a stack trace. 

> INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to 
> NameNode exit
> 
>
> Key: HDFS-12832
> URL: https://issues.apache.org/jira/browse/HDFS-12832
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.4, 3.0.0-beta1
>Reporter: DENG FEI
>Priority: Critical
> Attachments: HDFS-12832-trunk-001.patch
>
>
> {code:title=INode.java|borderStyle=solid}
> public String getFullPathName() {
> // Get the full path name of this inode.
> if (isRoot()) {
>   return Path.SEPARATOR;
> }
> // compute size of needed bytes for the path
> int idx = 0;
> for (INode inode = this; inode != null; inode = inode.getParent()) {
>   // add component + delimiter (if not tail component)
>   idx += inode.getLocalNameBytes().length + (inode != this ? 1 : 0);
> }
> byte[] path = new byte[idx];
> for (INode inode = this; inode != null; inode = inode.getParent()) {
>   if (inode != this) {
> path[--idx] = Path.SEPARATOR_CHAR;
>   }
>   byte[] name = inode.getLocalNameBytes();
>   idx -= name.length;
>   System.arraycopy(name, 0, path, idx, name.length);
> }
> return DFSUtil.bytes2String(path);
>   }
> {code}
> We found ArrayIndexOutOfBoundsException at 
> _{color:#707070}System.arraycopy(name, 0, path, idx, name.length){color}_ 
> when ReplicaMonitor work ,and the NameNode will quit.
> It seems the two loop is not synchronized, the path's length is changed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-12786) Ozone: add port/service names to the ksm/scm web ui

2017-11-20 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257815#comment-16257815
 ] 

Xiaoyu Yao edited comment on HDFS-12786 at 11/20/17 9:20 PM:
-

[~elek], thanks for reporting the issue and posting the fix. 
The change looks good to me. Can we remove the ":" before the port number? 
Otherwise, +1. 


was (Author: xyao):
[~elek], thanks for reporting the issue and posting the fix. 
It looks good to me and I only have one question: do we need to export 
serverName from the ksm/scm UI(e.g., ksm.js or scm.js) so that the 
tag.serverName referred in ozone.js is not null or empty?

> Ozone: add port/service names to the ksm/scm web ui
> ---
>
> Key: HDFS-12786
> URL: https://issues.apache.org/jira/browse/HDFS-12786
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Minor
> Attachments: HDFS-12786-HDFS-7240.001.patch
>
>
> Since HDFS-12655 an additional serviceNames field is available for all rpc 
> service via the metrics interface.
> This super small patch modifies to scm/ksm web ui to display this name.
> Instead of
> :9863
> We will display:
> ScmBlockLocationProtocolService (:9863)
> TESTING:
> Start dozone cluster and check the header of the rpc metrics section on the 
> web ui: http://localhost:9876/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12804) Use slf4j instead of log4j in FSEditLog

2017-11-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259856#comment-16259856
 ] 

Hudson commented on HDFS-12804:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13261 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13261/])
HDFS-12804. Use slf4j instead of log4j in FSEditLog. Contributed by (cliang: 
rev 60fc2a138827c2c29fa7e9d6844e3b8d43809726)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogRace.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogAutoroll.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestEditLogTailer.java


> Use slf4j instead of log4j in FSEditLog
> ---
>
> Key: HDFS-12804
> URL: https://issues.apache.org/jira/browse/HDFS-12804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Attachments: HDFS-12804.001.patch, HDFS-12804.002.patch, 
> HDFS-12804.003.patch
>
>
> FSEditLog uses log4j, this jira will update the logging to use sl4j.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12804) Use slf4j instead of log4j in FSEditLog

2017-11-20 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12804:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Use slf4j instead of log4j in FSEditLog
> ---
>
> Key: HDFS-12804
> URL: https://issues.apache.org/jira/browse/HDFS-12804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Attachments: HDFS-12804.001.patch, HDFS-12804.002.patch, 
> HDFS-12804.003.patch
>
>
> FSEditLog uses log4j, this jira will update the logging to use sl4j.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12804) Use slf4j instead of log4j in FSEditLog

2017-11-20 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259818#comment-16259818
 ] 

Chen Liang commented on HDFS-12804:
---

Thanks [~msingh] for the update, I tested locally also, {{TestUnbuffer}} and 
{{TestBalancerRPCDelay}} did fail even without the patch. I've committed v003 
patch to trunk. Thanks Mukul for the contribution!

> Use slf4j instead of log4j in FSEditLog
> ---
>
> Key: HDFS-12804
> URL: https://issues.apache.org/jira/browse/HDFS-12804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Attachments: HDFS-12804.001.patch, HDFS-12804.002.patch, 
> HDFS-12804.003.patch
>
>
> FSEditLog uses log4j, this jira will update the logging to use sl4j.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12641) Backport HDFS-11755 into branch-2.7 to fix a regression in HDFS-11445

2017-11-20 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259820#comment-16259820
 ] 

Wei-Chiu Chuang commented on HDFS-12641:


The tests failures and warnings don't seem related. Triggering precommit job 
again. 

> Backport HDFS-11755 into branch-2.7 to fix a regression in HDFS-11445
> -
>
> Key: HDFS-12641
> URL: https://issues.apache.org/jira/browse/HDFS-12641
> Project: Hadoop HDFS
>  Issue Type: Task
>Affects Versions: 2.7.4
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-12641.branch-2.7.001.patch
>
>
> Our internal testing caught a regression in HDFS-11445 when we cherry picked 
> the commit into CDH. Basically, it produces bogus missing file warnings. 
> Further analysis revealed that the regression is actually fixed by HDFS-11755.
> Because of the order commits are merged in branch-2.8 ~ trunk (HDFS-11755 was 
> committed before HDFS-11445), the regression was never actually surfaced for 
> Hadoop 2.8/3.0.0-(alpha/beta) users. Since branch-2.7 has HDFS-11445 but no 
> HDFS-11755, I suspect the regression is more visible for Hadoop 2.7.4.
> I am filing this jira to raise more awareness, than simply backporting 
> HDFS-11755 into branch-2.7.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12171) Reduce IIP object allocations for inode lookup

2017-11-20 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259815#comment-16259815
 ] 

Wei-Chiu Chuang commented on HDFS-12171:


For future reference. The object allocation came from the code refactor in 
HDFS-7498.

> Reduce IIP object allocations for inode lookup
> --
>
> Key: HDFS-12171
> URL: https://issues.apache.org/jira/browse/HDFS-12171
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.9.0, 3.0.0-beta1, 2.8.3
>
> Attachments: HDFS-12171.branch-2.patch, HDFS-12171.patch
>
>
> {{IIP#getReadOnlyINodes}} is invoked frequently for EZ and EC lookups.  It 
> allocates unnecessary objects to make the primitive array an immutable array 
> list.  IIP already has a method for indexed inode retrieval that can be 
> tweaked to further improve performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12740) SCM should support a RPC to share the cluster Id with KSM and DataNodes

2017-11-20 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259781#comment-16259781
 ] 

Nanda kumar commented on HDFS-12740:


Thanks [~shashikant] for updating the patch, it looks good to me. Some minor 
comments,

{{org.apache.hadoop.scm.container.common.helpers}} is not the right place for 
{{ScmInfo}}. We can move {{ScmInfo}} under {{org.apache.hadoop.scm}}

MiniOzoneClassicCluster.java
Line:529 - 533 can be replaced with 
{{scmStore.setClusterId(clusterId.orElse(runID.toString()))}}
Line:534 - 538 can be replaced with 
{{scmStore.setScmId(scmId.orElse(UUID.randomUUID().toString()))}}

NITs
ScmBlockLocationProtocol.proto
Line: 135, 136, 141 & 142 incorrect indentation

StorageContainerManager.java
Line: 1057 & 1058 {{this}} keyword can be removed

> SCM should support a RPC to share the cluster Id with KSM and DataNodes
> ---
>
> Key: HDFS-12740
> URL: https://issues.apache.org/jira/browse/HDFS-12740
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
> Fix For: HDFS-7240
>
> Attachments: HDFS-12740-HDFS-7240.001.patch, 
> HDFS-12740-HDFS-7240.002.patch, HDFS-12740-HDFS-7240.003.patch, 
> HDFS-12740-HDFS-7240.004.patch
>
>
> When the ozone cluster is first Created, SCM --init command will generate 
> cluster Id as well as SCM Id and persist it locally. The same cluster Id and 
> the SCM id will be shared with KSM during the KSM initialization and 
> Datanodes during datanode registration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12730) Verify open files captured in the snapshots across config disable and enable

2017-11-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259768#comment-16259768
 ] 

Hudson commented on HDFS-12730:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13259 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13259/])
HDFS-12730. Verify open files captured in the snapshots across config 
(manojpec: rev 9fb4effd2c4de2d83b667a43e8798315e85ff79b)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestOpenFilesWithSnapshot.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java


> Verify open files captured in the snapshots across config disable and enable
> 
>
> Key: HDFS-12730
> URL: https://issues.apache.org/jira/browse/HDFS-12730
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Fix For: 3.1.0
>
> Attachments: HDFS-12730.01.patch, HDFS-12730.02.patch
>
>
> Open files captured in the snapshots have their meta data preserved based on 
> the config 
> _dfs.namenode.snapshot.capture.openfiles_ (refer HDFS-11402). During the 
> upgrade scenario or when the NameNode gets restarted with config turned on or 
> off,  the attributes of the open files captured in the snapshots are 
> influenced accordingly. Better to have a test case to verify open file 
> attributes across config turn on and off, and the current expected behavior 
> with HDFS-11402 so as to catch any regressions in the future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12826) Document Saying the RPC port, But it's required IPC port in Balancer Document.

2017-11-20 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259763#comment-16259763
 ] 

Chen Liang commented on HDFS-12826:
---

Thanks [~peruguusha] for the patch! Would it be a bit more precise to take the 
other way: changing {{ipc}} to {{rpc}} instead?

> Document Saying the RPC port, But it's required IPC port in Balancer Document.
> --
>
> Key: HDFS-12826
> URL: https://issues.apache.org/jira/browse/HDFS-12826
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover, documentation
>Affects Versions: 3.0.0-beta1
>Reporter: Harshakiran Reddy
>Assignee: usharani
>Priority: Minor
> Attachments: HDFS-12826.patch
>
>
> In {{Adding a new Namenode to an existing HDFS cluster}} , refreshNamenodes 
> command required IPC port but in Documentation it's saying the RPC port.
> http://hadoop.apache.org/docs/r3.0.0-beta1/hadoop-project-dist/hadoop-hdfs/Federation.html#Balancer
> {noformat} 
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes host-name:65110
> refreshNamenodes: Unknown protocol: 
> org.apache.hadoop.hdfs.protocol.ClientDatanodeProtocol
> bin.:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes
> Usage: hdfs dfsadmin [-refreshNamenodes datanode-host:ipc_port]
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin 
> -refreshNamenodes host-name:50077
> bin>:~/hdfsdata/HA/install/hadoop/datanode/bin>
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container

2017-11-20 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259728#comment-16259728
 ] 

Shashikant Banerjee edited comment on HDFS-12794 at 11/20/17 7:58 PM:
--

Thanks [~anu] , for the review comments.
As per discussion with [~anu], here are few conclusions:
1)
{code}
//make sure all the data in the ChunkoutputStreams is written to the
  //  container
  Preconditions.checkArgument(
  semaphore.availablePermits() == getMaxOutstandingChunks());
}
{code}

While doing close on the groupOutputStream, we do chunkOutputstream.close, 
where we do future.get() on response obtained after the write completes from 
the xceiver server which makes sure the response is received from the xceiver 
server. While closing the groupStream, semaphorePermiCount should be equal to 
no of available permits which is equal to max no of outstanding chunks at any 
given point of time.


2.  {code}
throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e);
}
{code}
Hardcoding the exception when the writeChunkToConatiner calls completes in the 
xceiverServer shows that , the exception is caught in the 
chunkoutputGroupStream.close path which is expected.

{code}
  response = response.thenApply(reply -> {
try{
  throw new IOException("Exception while validating response");
 // ContainerProtocolCalls.validateContainerResponse(reply);
 // return reply;
}catch (IOException e){
  throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e)
{code}
{code}
java.io.IOException: Unexpected Storage Container Exception: 
java.util.concurrent.ExecutionException: java.io.IOException: Exception while 
validating response

  at 
org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291)
{code}

This is as expected.  Idea was to write a mocktest while 
validatingContainerResposne calls which is static method of a final class, and 
this requires powerMockrunner which leads to issues while bringing up the 
miniOzoneCluster.Will address the unit test to vertify the same later in a 
different jira.

Patch v3 addresses the remaining review comments.
[~anu]/others, please have a look.



was (Author: shashikant):
Thanks [~anu] , for the review comments.
As per discussion with [~anu], here are few conclusions:
1)
{code}
//make sure all the data in the ChunkoutputStreams is written to the
  //  container
  Preconditions.checkArgument(
  semaphore.availablePermits() == getMaxOutstandingChunks());
}
{code}

While doing close on the groupOutputStream, we do chunkOutputstream.close, 
where we do future.get() on response obtained after the write completes from 
the xceiver server which makes sure the response is received from the xceiver 
server. While closing the groupStream, semaphorePermiCount should be equal to 
no of available permits which is equal to max no of outstanding chunks at any 
given point of time.


2.  {code}
throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e);
}
{code}
Hardcoding the exception when the writeChunkToConatiner calls completes in the 
xceiverServer shows that , the exception is caught in the 
chunkoutputGroupStream.close path which is expected.

{code}
  response = response.thenApply(reply -> {
try{
  throw new IOException("Exception while validating response");
 // ContainerProtocolCalls.validateContainerResponse(reply);
 // return reply;
}catch (IOException e){
  throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e)
{code}
java.io.IOException: Unexpected Storage Container Exception: 
java.util.concurrent.ExecutionException: java.io.IOException: Exception while 
validating response

  at 
org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291)
{code}

This is as expected.  Idea was to write a mocktest while 
validatingContainerResposne calls which is static method of a final class, and 
this requires powerMockrunner which leads to issues while bringing up the 
miniOzoneCluster.Will address the unit test to vertify the same later in a 
different jira.

Patch v3 addresses the remaining review comments.
[~anu]/others, please have a look.


> Ozone: Parallelize

[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container

2017-11-20 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259728#comment-16259728
 ] 

Shashikant Banerjee edited comment on HDFS-12794 at 11/20/17 7:57 PM:
--

Thanks [~anu] , for the review comments.
As per discussion with [~anu], here are few conclusions:
1)
{code}
//make sure all the data in the ChunkoutputStreams is written to the
  //  container
  Preconditions.checkArgument(
  semaphore.availablePermits() == getMaxOutstandingChunks());
}
{code}

While doing close on the groupOutputStream, we do chunkOutputstream.close, 
where we do future.get() on response obtained after the write completes from 
the xceiver server which makes sure the response is received from the xceiver 
server. While closing the groupStream, semaphorePermiCount should be equal to 
no of available permits which is equal to max no of outstanding chunks at any 
given point of time.


2.  {code}
throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e);
}
{code}
Hardcoding the exception when the writeChunkToConatiner calls completes in the 
xceiverServer shows that , the exception is caught in the 
chunkoutputGroupStream.close path which is expected.

{code}
  response = response.thenApply(reply -> {
try{
  throw new IOException("Exception while validating response");
 // ContainerProtocolCalls.validateContainerResponse(reply);
 // return reply;
}catch (IOException e){
  throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e)
{code}
java.io.IOException: Unexpected Storage Container Exception: 
java.util.concurrent.ExecutionException: java.io.IOException: Exception while 
validating response

  at 
org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291)
{code}

This is as expected.  Idea was to write a mocktest while 
validatingContainerResposne calls which is static method of a final class, and 
this requires powerMockrunner which leads to issues while bringing up the 
miniOzoneCluster.Will address the unit test to vertify the same later in a 
different jira.

Patch v3 addresses the remaining review comments.
[~anu]/others, please have a look.



was (Author: shashikant):
Thanks [~anu] , for the review comments.
As per discussion with [~anu], here are few conclusions:
1)
code {}
//make sure all the data in the ChunkoutputStreams is written to the
  //  container
  Preconditions.checkArgument(
  semaphore.availablePermits() == getMaxOutstandingChunks());
}
code {}

While doing close on the groupOutputStream, we do chunkOutputstream.close, 
where we do future.get() on response obtained after the write completes from 
the xceiver server which makes sure the response is received from the xceiver 
server. While closing the groupStream, semaphorePermiCount should be equal to 
no of available permits which is equal to max no of outstanding chunks at any 
given point of time.


2.  code {}
throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e);
}
code {}
Hardcoding the exception when the writeChunkToConatiner calls completes in the 
xceiverServer shows that , the exception is caught in the 
chunkoutputGroupStream.close path which is expected.

code{}
  response = response.thenApply(reply -> {
try{
  throw new IOException("Exception while validating response");
 // ContainerProtocolCalls.validateContainerResponse(reply);
 // return reply;
}catch (IOException e){
  throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e)
code{}
java.io.IOException: Unexpected Storage Container Exception: 
java.util.concurrent.ExecutionException: java.io.IOException: Exception while 
validating response

  at 
org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291)
code{}

This is as expected.  Idea was to write a mocktest while 
validatingContainerResposne calls which is static method of a final class, and 
this requires powerMockrunner which leads to issues while bringing up the 
miniOzoneCluster.Will address the unit test to vertify the same later in a 
different jira.

Patch v3 addresses the remaining review comments.
[~anu]/others, please have a look.


> Ozone: Parallelize ChunkOutputSream

[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container

2017-11-20 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259728#comment-16259728
 ] 

Shashikant Banerjee edited comment on HDFS-12794 at 11/20/17 7:50 PM:
--

Thanks [~anu] , for the review comments.
As per discussion with [~anu], here are few conclusions:
1)
code {}
//make sure all the data in the ChunkoutputStreams is written to the
  //  container
  Preconditions.checkArgument(
  semaphore.availablePermits() == getMaxOutstandingChunks());
}
code {}

While doing close on the groupOutputStream, we do chunkOutputstream.close, 
where we do future.get() on response obtained after the write completes from 
the xceiver server which makes sure the response is received from the xceiver 
server. While closing the groupStream, semaphorePermiCount should be equal to 
no of available permits which is equal to max no of outstanding chunks at any 
given point of time.


2.  code {}
throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e);
}
code {}
Hardcoding the exception when the writeChunkToConatiner calls completes in the 
xceiverServer shows that , the exception is caught in the 
chunkoutputGroupStream.close path which is expected.

code{}
  response = response.thenApply(reply -> {
try{
  throw new IOException("Exception while validating response");
 // ContainerProtocolCalls.validateContainerResponse(reply);
 // return reply;
}catch (IOException e){
  throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e)
code{}
java.io.IOException: Unexpected Storage Container Exception: 
java.util.concurrent.ExecutionException: java.io.IOException: Exception while 
validating response

  at 
org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291)
code{}

This is as expected.  Idea was to write a mocktest while 
validatingContainerResposne calls which is static method of a final class, and 
this requires powerMockrunner which leads to issues while bringing up the 
miniOzoneCluster.Will address the unit test to vertify the same later in a 
different jira.

Patch v3 addresses the remaining review comments.
[~anu]/others, please have a look.



was (Author: shashikant):
Thanks [~anu] , for the review comments.
As per discussion with [~anu], here are few conclusions:
1)
code {
//make sure all the data in the ChunkoutputStreams is written to the
  //  container
  Preconditions.checkArgument(
  semaphore.availablePermits() == getMaxOutstandingChunks());
}

While doing close on the groupOutputStream, we do chunkOutputstream.close, 
where we do future.get() on response obtained after the write completes from 
the xceiver server which makes sure the response is received from the xceiver 
server. While closing the groupStream, semaphorePermiCount should be equal to 
no of available permits which is equal to max no of outstanding chunks at any 
given point of time.


2.  code {
throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e);
}

Hardcoding the exception when the writeChunkToConatiner calls completes in the 
xceiverServer shows that , the exception is caught in the 
chunkoutputGroupStream.close path which is expected.

Code {
  response = response.thenApply(reply -> {
try{
  throw new IOException("Exception while validating response");
 // ContainerProtocolCalls.validateContainerResponse(reply);
 // return reply;
}catch (IOException e){
  throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e)

java.io.IOException: Unexpected Storage Container Exception: 
java.util.concurrent.ExecutionException: java.io.IOException: Exception while 
validating response

  at 
org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291)
}

This is as expected.  Idea was to write a mocktest while 
validatingContainerResposne calls which is static method of a final class, and 
this requires powerMockrunner which leads to issues while bringing up the 
miniOzoneCluster.Will address the unit test to vertify the same later in a 
different jira.

Patch v3 addresses the remaining review comments.
[~anu]/others, please have a look.


> Ozone: Parallelize ChunkOutputSream Writes to container
>

[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container

2017-11-20 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259728#comment-16259728
 ] 

Shashikant Banerjee edited comment on HDFS-12794 at 11/20/17 7:45 PM:
--

Thanks [~anu] , for the review comments.
As per discussion with [~anu], here are few conclusions:
1)
code {
//make sure all the data in the ChunkoutputStreams is written to the
  //  container
  Preconditions.checkArgument(
  semaphore.availablePermits() == getMaxOutstandingChunks());
}

While doing close on the groupOutputStream, we do chunkOutputstream.close, 
where we do future.get() on response obtained after the write completes from 
the xceiver server which makes sure the response is received from the xceiver 
server. While closing the groupStream, semaphorePermiCount should be equal to 
no of available permits which is equal to max no of outstanding chunks at any 
given point of time.


2.  code {
throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e);
}

Hardcoding the exception when the writeChunkToConatiner calls completes in the 
xceiverServer shows that , the exception is caught in the 
chunkoutputGroupStream.close path which is expected.

Code {
  response = response.thenApply(reply -> {
try{
  throw new IOException("Exception while validating response");
 // ContainerProtocolCalls.validateContainerResponse(reply);
 // return reply;
}catch (IOException e){
  throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e)

java.io.IOException: Unexpected Storage Container Exception: 
java.util.concurrent.ExecutionException: java.io.IOException: Exception while 
validating response

  at 
org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291)
}

This is as expected.  Idea was to write a mocktest while 
validatingContainerResposne calls which is static method of a final class, and 
this requires powerMockrunner which leads to issues while bringing up the 
miniOzoneCluster.Will address the unit test to vertify the same later in a 
different jira.

Patch v3 addresses the remaining review comments.
[~anu]/others, please have a look.



was (Author: shashikant):
Thanks [~anu] , for the review comments.
As per discussion with [~anu], here are few conclusions:
1)
code {
//make sure all the data in the ChunkoutputStreams is written to the
  //  container
  Preconditions.checkArgument(
  semaphore.availablePermits() == getMaxOutstandingChunks());
}

While doing close on the groupOutputStream, we do chunkOutputstream.close, 
where we do future.get() on response obtained after the write completes from 
the xceiver server which makes sure the response is received from the xceiver 
server. While closing the groupStream, semaphorePermiCount should be equal to 
no of available permits which is equal to max no of outstanding chunks at any 
given point of time.


2.  code {
throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e);
}

Hardcoding the exception when the writeChunkToConatiner calls completes in the 
xceiverServer shows that , the exception is caught in the 
chunkoutputGroupStream.close path which is expected.

Code {
  response = response.thenApply(reply -> {
try {
  throw new IOException("Exception while validating response");
 // ContainerProtocolCalls.validateContainerResponse(reply);
 // return reply;
} catch (IOException e) {
  throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e);
} 


java.io.IOException: Unexpected Storage Container Exception: 
java.util.concurrent.ExecutionException: java.io.IOException: Exception while 
validating response

  at 
org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291)
}

This is as expected.  Idea was to write a mocktest while 
validatingContainerResposne calls which is static method of a final class, and 
this requires powerMockrunner which leads to issues while bringing up the 
miniOzoneCluster.Will address the unit test to vertify the same later in a 
different jira.

Patch v3 addresses the remaining review comments.
[~anu]/others, please have a look.


> Ozone: Parallelize ChunkOutputSream Writes to container
>

[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container

2017-11-20 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259728#comment-16259728
 ] 

Shashikant Banerjee edited comment on HDFS-12794 at 11/20/17 7:43 PM:
--

Thanks [~anu] , for the review comments.
As per discussion with [~anu], here are few conclusions:
1)
code {
//make sure all the data in the ChunkoutputStreams is written to the
  //  container
  Preconditions.checkArgument(
  semaphore.availablePermits() == getMaxOutstandingChunks());
}

While doing close on the groupOutputStream, we do chunkOutputstream.close, 
where we do future.get() on response obtained after the write completes from 
the xceiver server which makes sure the response is received from the xceiver 
server. While closing the groupStream, semaphorePermiCount should be equal to 
no of available permits which is equal to max no of outstanding chunks at any 
given point of time.


2.  code {
throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e);
}

Hardcoding the exception when the writeChunkToConatiner calls completes in the 
xceiverServer shows that , the exception is caught in the 
chunkoutputGroupStream.close path which is expected.

Code {
  response = response.thenApply(reply -> {
try {
  throw new IOException("Exception while validating response");
 // ContainerProtocolCalls.validateContainerResponse(reply);
 // return reply;
} catch (IOException e) {
  throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e);
} 


java.io.IOException: Unexpected Storage Container Exception: 
java.util.concurrent.ExecutionException: java.io.IOException: Exception while 
validating response

  at 
org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291)
}

This is as expected.  Idea was to write a mocktest while 
validatingContainerResposne calls which is static method of a final class, and 
this requires powerMockrunner which leads to issues while bringing up the 
miniOzoneCluster.Will address the unit test to vertify the same later in a 
different jira.

Patch v3 addresses the remaining review comments.
[~anu]/others, please have a look.



was (Author: shashikant):
Thanks [~anu] , for the review comments.
As per discussion with [~anu], here are few conclusions:
1)
code {
//make sure all the data in the ChunkoutputStreams is written to the
  //  container
  Preconditions.checkArgument(
  semaphore.availablePermits() == getMaxOutstandingChunks());
}

# While doing close on the groupOutputStream, we do chunkOutputstream.close, 
where we do future.get() on response obtained after the write completes from 
the xceiver server which makes sure the response is received from the xceiver 
server. While closing the groupStream, semaphorePermiCount should be equal to 
no of available permits which is equal to max no of outstanding chunks at any 
given pint of time.


2.  code {
throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e);
}

Hardcoding the exception when the writeChunkToConatiner calls completes in the 
sceiverServer shows that , the exception is caught in the 
chunkoutputGroupStream.close path which is expected.

Code {
 try {
  String requestID =
  traceID + chunkIndex + ContainerProtos.Type.WriteChunk.name();
  //add the chunk write traceId to the queue
  semaphore.acquire();
  LOG.warn("calling async");
  response =
  writeChunkAsync(xceiverClient, chunk, key, data, requestID);
  response = response.thenApply(reply -> {
try {
  throw new IOException("Exception while validating response");
 // ContainerProtocolCalls.validateContainerResponse(reply);
 // return reply;
} catch (IOException e) {
  LOG.info("coming here to throw exception");
  throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e);
} 
}

code {
java.io.IOException: Unexpected Storage Container Exception: 
java.util.concurrent.ExecutionException: java.io.IOException: Exception while 
validating response

  at 
org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291)
}

This is as expected.  Idea was to write a mocktest while 
validatingContainerResposne

[jira] [Commented] (HDFS-12799) Ozone: SCM: Close containers: extend SCMCommandResponseProto with SCMCloseContainerCmdResponseProto

2017-11-20 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259729#comment-16259729
 ] 

Chen Liang commented on HDFS-12799:
---

Thanks for working on this [~elek]. Returning current state does seem to be a 
bug and was causing test failing. So I fixed it in HDFS-12793. As for creating 
container, have you tried to call 
{{cluster.getStorageContainerManager().allocateContainer}}, followed with two 
{{mapping.updateContainerState}}, just like what 
{{TestContainerMapping#createContainer}} is doing?

> Ozone: SCM: Close containers: extend SCMCommandResponseProto with 
> SCMCloseContainerCmdResponseProto
> ---
>
> Key: HDFS-12799
> URL: https://issues.apache.org/jira/browse/HDFS-12799
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
> Attachments: HDFS-12799-HDFS-7240.001.patch
>
>
> This issue is about extending the HB response protocol between SCM and DN 
> with a command to ask the datanode to close a container. (This is just about 
> extending the protocol not about fixing the implementation of SCM tto handle 
> the state transitions).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container

2017-11-20 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-12794:
---
Attachment: HDFS-12794-HDFS-7240.003.patch

Thanks [~anu] , for the review comments.
As per discussion with [~anu], here are few conclusions:
1)
code {
//make sure all the data in the ChunkoutputStreams is written to the
  //  container
  Preconditions.checkArgument(
  semaphore.availablePermits() == getMaxOutstandingChunks());
}

# While doing close on the groupOutputStream, we do chunkOutputstream.close, 
where we do future.get() on response obtained after the write completes from 
the xceiver server which makes sure the response is received from the xceiver 
server. While closing the groupStream, semaphorePermiCount should be equal to 
no of available permits which is equal to max no of outstanding chunks at any 
given pint of time.


2.  code {
throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e);
}

Hardcoding the exception when the writeChunkToConatiner calls completes in the 
sceiverServer shows that , the exception is caught in the 
chunkoutputGroupStream.close path which is expected.

Code {
 try {
  String requestID =
  traceID + chunkIndex + ContainerProtos.Type.WriteChunk.name();
  //add the chunk write traceId to the queue
  semaphore.acquire();
  LOG.warn("calling async");
  response =
  writeChunkAsync(xceiverClient, chunk, key, data, requestID);
  response = response.thenApply(reply -> {
try {
  throw new IOException("Exception while validating response");
 // ContainerProtocolCalls.validateContainerResponse(reply);
 // return reply;
} catch (IOException e) {
  LOG.info("coming here to throw exception");
  throw new CompletionException(
  "Unexpected Storage Container Exception: " + e.toString(), e);
} 
}

code {
java.io.IOException: Unexpected Storage Container Exception: 
java.util.concurrent.ExecutionException: java.io.IOException: Exception while 
validating response

  at 
org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)
  at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291)
}

This is as expected.  Idea was to write a mocktest while 
validatingContainerResposne calls which is static method of a final class, and 
this requires powerMockrunner which leads to issues while bringing up the 
miniOzoneCluster.Will address the unit test to vertify the same later in a 
different jira.

Patch v3 addresses the remaining review comments.
[~anu]/others, please have a look.


> Ozone: Parallelize ChunkOutputSream Writes to container
> ---
>
> Key: HDFS-12794
> URL: https://issues.apache.org/jira/browse/HDFS-12794
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
> Fix For: HDFS-7240
>
> Attachments: HDFS-12794-HDFS-7240.001.patch, 
> HDFS-12794-HDFS-7240.002.patch, HDFS-12794-HDFS-7240.003.patch
>
>
> The chunkOutPutStream Write are sync in nature .Once one chunk of data gets 
> written, the next chunk write is blocked until the previous chunk is written 
> to the container.
> The ChunkOutputWrite Stream writes should be made async and Close on the 
> OutputStream should ensure flushing of all dirty buffers to the container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12807) Ozone: Expose RockDB stats via JMX for Ozone metadata stores

2017-11-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259709#comment-16259709
 ] 

Hadoop QA commented on HDFS-12807:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
36s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
4s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
53s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
43s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
16s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 42s{color} | {color:orange} hadoop-hdfs-project: The patch generated 11 new 
+ 0 unchanged - 0 fixed = 11 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
31s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}131m  2s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}220m 30s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Unreaped Processes | hadoop-hdfs:1 |
| Failed junit tests | hadoop.ozone.ozShell.TestOzoneShell |
|   | hadoop.hdfs.TestLeaseRecovery2 |
|   | hadoop.ozone.TestOzoneConfigurationFields |
|   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
|   | hadoop.ozone.container.common.impl.TestContainerPersistence |
|   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.cblock.TestCBlockReadWrite |
|   | hadoop.fs.TestUnbuffer |
|   | hadoop.ozone.web.client.TestKeys |
| Timed out junit tests | org.apache.hadoop.cblock.TestLocalBlockCache |
\\
\\
|| Subsystem ||

[jira] [Updated] (HDFS-12787) Ozone: SCM: Aggregate the metrics from all the container reports

2017-11-20 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-12787:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-7240
   Status: Resolved  (was: Patch Available)

Thanks [~linyiqun] for the contribution. +1 for the latest patch. I've 
committed to the feature branch. 

> Ozone: SCM: Aggregate the metrics from all the container reports
> 
>
> Key: HDFS-12787
> URL: https://issues.apache.org/jira/browse/HDFS-12787
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: metrics, ozone
>Affects Versions: HDFS-7240
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Fix For: HDFS-7240
>
> Attachments: HDFS-12787-HDFS-7240.001.patch, 
> HDFS-12787-HDFS-7240.002.patch, HDFS-12787-HDFS-7240.003.patch, 
> HDFS-12787-HDFS-7240.004.patch, HDFS-12787-HDFS-7240.005.patch, 
> HDFS-12787-HDFS-7240.006.patch
>
>
> We should aggregate the metrics from all the reports of different datanodes 
> in addition to the last report. This way, we can get a global view of the 
> container I/Os over the ozone cluster. This is a follow up work of HDFS-11468.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12730) Verify open files captured in the snapshots across config disable and enable

2017-11-20 Thread Manoj Govindassamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HDFS-12730:
--
   Resolution: Fixed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

Thanks for the review [~yzhangal] and [~hanishakoneru]. Committed it to trunk.

> Verify open files captured in the snapshots across config disable and enable
> 
>
> Key: HDFS-12730
> URL: https://issues.apache.org/jira/browse/HDFS-12730
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Fix For: 3.1.0
>
> Attachments: HDFS-12730.01.patch, HDFS-12730.02.patch
>
>
> Open files captured in the snapshots have their meta data preserved based on 
> the config 
> _dfs.namenode.snapshot.capture.openfiles_ (refer HDFS-11402). During the 
> upgrade scenario or when the NameNode gets restarted with config turned on or 
> off,  the attributes of the open files captured in the snapshots are 
> influenced accordingly. Better to have a test case to verify open file 
> attributes across config turn on and off, and the current expected behavior 
> with HDFS-11402 so as to catch any regressions in the future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12836) startTxId could be greater than endTxId when tailing in-progress edit log

2017-11-20 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259559#comment-16259559
 ] 

Wei-Chiu Chuang commented on HDFS-12836:


Updated Affects Version/s based on the fix version of HDFS-10519 

> startTxId could be greater than endTxId when tailing in-progress edit log
> -
>
> Key: HDFS-12836
> URL: https://issues.apache.org/jira/browse/HDFS-12836
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.0.0-alpha1
>Reporter: Chao Sun
>Assignee: Chao Sun
>
> When {{dfs.ha.tail-edits.in-progress}} is true, edit log tailer will also 
> tail those in progress edit log segments. However, in the following code:
> {code}
> if (onlyDurableTxns && inProgressOk) {
>   endTxId = Math.min(endTxId, committedTxnId);
> }
> EditLogInputStream elis = EditLogFileInputStream.fromUrl(
> connectionFactory, url, remoteLog.getStartTxId(),
> endTxId, remoteLog.isInProgress());
> {code}
> it is possible that {{remoteLog.getStartTxId()}} could be greater than 
> {{endTxId}}, and therefore will cause the following error:
> {code}
> 2017-11-17 19:55:41,165 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Error replaying edit log at offset 1048576.  Expected transaction ID was 87
> Recent opcode offsets: 1048576
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException:
>  got premature end-of-file at txid 86; expected file to go up to 85
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:197)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:189)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:205)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
> 2017-11-17 19:55:41,165 WARN 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Error while reading 
> edits from disk. Will try again.
> org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying 
> edit log at offset 1048576.  Expected transaction ID was 87
> Recent opcode offsets: 1048576
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:218)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
> Caused by: 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException:
>  got premature end-of-file at txid 86; expected file to go up to 85
> at 
>

[jira] [Updated] (HDFS-12836) startTxId could be greater than endTxId when tailing in-progress edit log

2017-11-20 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-12836:
---
Affects Version/s: 3.0.0-alpha1

> startTxId could be greater than endTxId when tailing in-progress edit log
> -
>
> Key: HDFS-12836
> URL: https://issues.apache.org/jira/browse/HDFS-12836
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.0.0-alpha1
>Reporter: Chao Sun
>Assignee: Chao Sun
>
> When {{dfs.ha.tail-edits.in-progress}} is true, edit log tailer will also 
> tail those in progress edit log segments. However, in the following code:
> {code}
> if (onlyDurableTxns && inProgressOk) {
>   endTxId = Math.min(endTxId, committedTxnId);
> }
> EditLogInputStream elis = EditLogFileInputStream.fromUrl(
> connectionFactory, url, remoteLog.getStartTxId(),
> endTxId, remoteLog.isInProgress());
> {code}
> it is possible that {{remoteLog.getStartTxId()}} could be greater than 
> {{endTxId}}, and therefore will cause the following error:
> {code}
> 2017-11-17 19:55:41,165 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Error replaying edit log at offset 1048576.  Expected transaction ID was 87
> Recent opcode offsets: 1048576
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException:
>  got premature end-of-file at txid 86; expected file to go up to 85
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:197)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:189)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:205)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
> 2017-11-17 19:55:41,165 WARN 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Error while reading 
> edits from disk. Will try again.
> org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying 
> edit log at offset 1048576.  Expected transaction ID was 87
> Recent opcode offsets: 1048576
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:218)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
> Caused by: 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException:
>  got premature end-of-file at txid 86; expected file to go up to 85
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:197)
> at 
>

[jira] [Commented] (HDFS-12711) deadly hdfs test

2017-11-20 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259536#comment-16259536
 ] 

Allen Wittenauer commented on HDFS-12711:
-

FYI HADOOP-13514.

> deadly hdfs test
> 
>
> Key: HDFS-12711
> URL: https://issues.apache.org/jira/browse/HDFS-12711
> Project: Hadoop HDFS
>  Issue Type: Test
>Affects Versions: 2.9.0, 2.8.2
>Reporter: Allen Wittenauer
>Priority: Critical
> Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-12832) INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to NameNode exit

2017-11-20 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259458#comment-16259458
 ] 

Erik Krogen edited comment on HDFS-12832 at 11/20/17 4:42 PM:
--

Thanks for reporting this and for working on a patch, [~Deng FEI]! Actually, 
the new method you are using, {{INode#getPathComponents()}} is subject to the 
same race condition. Generally {{INode}} is not meant to be a concurrent data 
structure as far as I can tell. I believe the issue is actually that 
{{ReplicationWork#chooseTargets()}} is being called without a lock:
{code:title=BlockManager.ReplicationWork}
  // choose replication targets: NOT HOLDING THE GLOBAL LOCK
  // It is costly to extract the filename for which chooseTargets is called,
  // so for now we pass in the block collection itself.
  rw.chooseTargets(blockplacement, storagePolicySuite, excludedNodes);
{code}
Within {{chooseTargets()}} various methods on {{INode}}/{{BlockCollection}}, 
{{DatanodeDescriptor}}, {{DatanodeStorageInfo}}, and {{Block}} are called which 
it seems should not be allowed outside of the lock.

[~Deng FEI], do you have a stack trace available to confirm that this is the 
same code path which caused your exception? This is the code path that was 
taken to trigger the issue for us.


was (Author: xkrogen):
Thanks for reporting this and for working on a patch, [~Deng FEI]]! Actually, 
the new method you are using, {{INode#getPathComponents()}} is subject to the 
same race condition. Generally {{INode}} is not meant to be a concurrent data 
structure as far as I can tell. I believe the issue is actually that 
{{ReplicationWork#chooseTargets()}} is being called without a lock:
{code:title=BlockManager.ReplicationWork}
  // choose replication targets: NOT HOLDING THE GLOBAL LOCK
  // It is costly to extract the filename for which chooseTargets is called,
  // so for now we pass in the block collection itself.
  rw.chooseTargets(blockplacement, storagePolicySuite, excludedNodes);
{code}
Within {{chooseTargets()}} various methods on {{INode}}/{{BlockCollection}}, 
{{DatanodeDescriptor}}, {{DatanodeStorageInfo}}, and {{Block}} are called which 
it seems should not be allowed outside of the lock.

[~Deng FEI], do you have a stack trace available to confirm that this is the 
same code path which caused your exception? This is the code path that was 
taken to trigger the issue for us.

> INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to 
> NameNode exit
> 
>
> Key: HDFS-12832
> URL: https://issues.apache.org/jira/browse/HDFS-12832
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.4, 3.0.0-beta1
>Reporter: DENG FEI
>Priority: Critical
> Attachments: HDFS-12832-trunk-001.patch
>
>
> {code:title=INode.java|borderStyle=solid}
> public String getFullPathName() {
> // Get the full path name of this inode.
> if (isRoot()) {
>   return Path.SEPARATOR;
> }
> // compute size of needed bytes for the path
> int idx = 0;
> for (INode inode = this; inode != null; inode = inode.getParent()) {
>   // add component + delimiter (if not tail component)
>   idx += inode.getLocalNameBytes().length + (inode != this ? 1 : 0);
> }
> byte[] path = new byte[idx];
> for (INode inode = this; inode != null; inode = inode.getParent()) {
>   if (inode != this) {
> path[--idx] = Path.SEPARATOR_CHAR;
>   }
>   byte[] name = inode.getLocalNameBytes();
>   idx -= name.length;
>   System.arraycopy(name, 0, path, idx, name.length);
> }
> return DFSUtil.bytes2String(path);
>   }
> {code}
> We found ArrayIndexOutOfBoundsException at 
> _{color:#707070}System.arraycopy(name, 0, path, idx, name.length){color}_ 
> when ReplicaMonitor work ,and the NameNode will quit.
> It seems the two loop is not synchronized, the path's length is changed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12832) INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to NameNode exit

2017-11-20 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259458#comment-16259458
 ] 

Erik Krogen commented on HDFS-12832:


Thanks for reporting this and for working on a patch, [~Deng FEI]]! Actually, 
the new method you are using, {{INode#getPathComponents()}} is subject to the 
same race condition. Generally {{INode}} is not meant to be a concurrent data 
structure as far as I can tell. I believe the issue is actually that 
{{ReplicationWork#chooseTargets()}} is being called without a lock:
{code:title=BlockManager.ReplicationWork}
  // choose replication targets: NOT HOLDING THE GLOBAL LOCK
  // It is costly to extract the filename for which chooseTargets is called,
  // so for now we pass in the block collection itself.
  rw.chooseTargets(blockplacement, storagePolicySuite, excludedNodes);
{code}
Within {{chooseTargets()}} various methods on {{INode}}/{{BlockCollection}}, 
{{DatanodeDescriptor}}, {{DatanodeStorageInfo}}, and {{Block}} are called which 
it seems should not be allowed outside of the lock.

[~Deng FEI], do you have a stack trace available to confirm that this is the 
same code path which caused your exception? This is the code path that was 
taken to trigger the issue for us.

> INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to 
> NameNode exit
> 
>
> Key: HDFS-12832
> URL: https://issues.apache.org/jira/browse/HDFS-12832
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.4, 3.0.0-beta1
>Reporter: DENG FEI
>Priority: Critical
> Attachments: HDFS-12832-trunk-001.patch
>
>
> {code:title=INode.java|borderStyle=solid}
> public String getFullPathName() {
> // Get the full path name of this inode.
> if (isRoot()) {
>   return Path.SEPARATOR;
> }
> // compute size of needed bytes for the path
> int idx = 0;
> for (INode inode = this; inode != null; inode = inode.getParent()) {
>   // add component + delimiter (if not tail component)
>   idx += inode.getLocalNameBytes().length + (inode != this ? 1 : 0);
> }
> byte[] path = new byte[idx];
> for (INode inode = this; inode != null; inode = inode.getParent()) {
>   if (inode != this) {
> path[--idx] = Path.SEPARATOR_CHAR;
>   }
>   byte[] name = inode.getLocalNameBytes();
>   idx -= name.length;
>   System.arraycopy(name, 0, path, idx, name.length);
> }
> return DFSUtil.bytes2String(path);
>   }
> {code}
> We found ArrayIndexOutOfBoundsException at 
> _{color:#707070}System.arraycopy(name, 0, path, idx, name.length){color}_ 
> when ReplicaMonitor work ,and the NameNode will quit.
> It seems the two loop is not synchronized, the path's length is changed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12823) Backport HDFS-9259 "Make SO_SNDBUF size configurable at DFSClient" to branch-2.7

2017-11-20 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259393#comment-16259393
 ] 

Erik Krogen commented on HDFS-12823:


Thank you [~manojg] and [~zhz]!

> Backport HDFS-9259 "Make SO_SNDBUF size configurable at DFSClient" to 
> branch-2.7
> 
>
> Key: HDFS-12823
> URL: https://issues.apache.org/jira/browse/HDFS-12823
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, hdfs-client
>Reporter: Erik Krogen
>Assignee: Erik Krogen
> Fix For: 2.7.5
>
> Attachments: HDFS-12823-branch-2.7.000.patch, 
> HDFS-12823-branch-2.7.001.patch, HDFS-12823-branch-2.7.002.patch
>
>
> Given the pretty significant performance implications of HDFS-9259 (see 
> discussion in HDFS-10326) when doing transfers across high latency links, it 
> would be helpful to have this configurability exist in the 2.7 series. 
> Opening a new JIRA since the original HDFS-9259 has been closed for a while 
> and there are conflicts due to a few classes moving.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12807) Ozone: Expose RockDB stats via JMX for Ozone metadata stores

2017-11-20 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDFS-12807:

Status: Patch Available  (was: In Progress)

> Ozone: Expose RockDB stats via JMX for Ozone metadata stores
> 
>
> Key: HDFS-12807
> URL: https://issues.apache.org/jira/browse/HDFS-12807
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Elek, Marton
> Attachments: HDFS-12807-HDFS-7240.001.patch, 
> HDFS-12807-HDFS-7240.002.patch
>
>
> RocksDB JNI has an option to expose stats, this can be further exposed to 
> graphs and monitoring applications. We should expose them to our Rocks 
> metadata store implementation for troubleshooting metadata related 
> performance issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12807) Ozone: Expose RockDB stats via JMX for Ozone metadata stores

2017-11-20 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDFS-12807:

Attachment: HDFS-12807-HDFS-7240.002.patch

Second version. Fixed the case when the conf is null (unit tests).

> Ozone: Expose RockDB stats via JMX for Ozone metadata stores
> 
>
> Key: HDFS-12807
> URL: https://issues.apache.org/jira/browse/HDFS-12807
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Elek, Marton
> Attachments: HDFS-12807-HDFS-7240.001.patch, 
> HDFS-12807-HDFS-7240.002.patch
>
>
> RocksDB JNI has an option to expose stats, this can be further exposed to 
> graphs and monitoring applications. We should expose them to our Rocks 
> metadata store implementation for troubleshooting metadata related 
> performance issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12754) Lease renewal can hit a deadlock

2017-11-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259251#comment-16259251
 ] 

Hadoop QA commented on HDFS-12754:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 15m 
57s{color} | {color:red} Docker failed to build yetus/hadoop:5b98639. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-12754 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898486/HDFS-12754.008.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22142/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Lease renewal can hit a deadlock 
> -
>
> Key: HDFS-12754
> URL: https://issues.apache.org/jira/browse/HDFS-12754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-12754.001.patch, HDFS-12754.002.patch, 
> HDFS-12754.003.patch, HDFS-12754.004.patch, HDFS-12754.005.patch, 
> HDFS-12754.006.patch, HDFS-12754.007.patch, HDFS-12754.008.patch
>
>
> The Client and the renewer can hit a deadlock during close operation since 
> closeFile() reaches back to the DFSClient#removeFileBeingWritten. This is 
> possible if the client class close when the renewer is renewing a lease.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12754) Lease renewal can hit a deadlock

2017-11-20 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated HDFS-12754:
---
Attachment: HDFS-12754.008.patch

Attaching patch that improves the test for better coordination between 
leaseRenewer and the dfsClient using a latch. Fixed the visibility to the grace 
default to private.

> Lease renewal can hit a deadlock 
> -
>
> Key: HDFS-12754
> URL: https://issues.apache.org/jira/browse/HDFS-12754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-12754.001.patch, HDFS-12754.002.patch, 
> HDFS-12754.003.patch, HDFS-12754.004.patch, HDFS-12754.005.patch, 
> HDFS-12754.006.patch, HDFS-12754.007.patch, HDFS-12754.008.patch
>
>
> The Client and the renewer can hit a deadlock during close operation since 
> closeFile() reaches back to the DFSClient#removeFileBeingWritten. This is 
> possible if the client class close when the renewer is renewing a lease.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12740) SCM should support a RPC to share the cluster Id with KSM and DataNodes

2017-11-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259142#comment-16259142
 ] 

Hadoop QA commented on HDFS-12740:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
49s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
42s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
46s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
34s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
59s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 34s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
23s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}156m 29s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}219m 43s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.ozShell.TestOzoneShell |
|   | hadoop.fs.TestUnbuffer |
|   | hadoop.ozone.container.common.impl.TestContainerPersistence |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.ozone.web.client.TestKeys |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.cblock.TestBufferManager |
|   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.balancer.TestBalancerRPCDelay |
|   | hadoop.cblock.TestCBlockReadWrite |
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.ozone.TestOzoneConfigurationFields |
| Timed out junit tests |

[jira] [Commented] (HDFS-12787) Ozone: SCM: Aggregate the metrics from all the container reports

2017-11-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259058#comment-16259058
 ] 

Hadoop QA commented on HDFS-12787:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 0s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
19s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
11s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}141m  9s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}214m 36s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Unreaped Processes | hadoop-hdfs:2 |
| Failed junit tests | hadoop.ozone.ozShell.TestOzoneShell |
|   | hadoop.ozone.TestOzoneConfigurationFields |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
|   | hadoop.ozone.container.common.impl.TestContainerPersistence |
|   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
|   | hadoop.cblock.TestCBlockReadWrite |
|   | hadoop.fs.TestUnbuffer |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.ozone.web.client.TestKeys |
| Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 |
|   | org.apache.hadoop.cblock.TestLocalBlockCache |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d11161b |
| JIRA Issue | HDFS-12787 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898422/HDFS-12787-HDFS-7240.006.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5877da42727e 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | HDFS-7240 / 5dc4dfa |
| maven | version:

[jira] [Commented] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure

2017-11-20 Thread Gang Xie (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258981#comment-16258981
 ] 

Gang Xie commented on HDFS-12820:
-

Why we don't substract nodesInService when we complete the decommission of the 
datanode and it becomes dead? And consideration here? 

> Decommissioned datanode is counted in service cause datanode allcating failure
> --
>
> Key: HDFS-12820
> URL: https://issues.apache.org/jira/browse/HDFS-12820
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement
>Affects Versions: 2.4.0
>Reporter: Gang Xie
>
> When allocate a datanode when dfsclient write with considering the load, it 
> checks if the datanode is overloaded by calculating the average xceivers of 
> all the in service datanode. But if the datanode is decommissioned and become 
> dead, it's still treated as in service, which make the average load much more 
> than the real one especially when the number of the decommissioned datanode 
> is great. In our cluster, 180 datanode, and 100 of them decommissioned, and 
> the average load is 17. This failed all the datanode allocation. 
> private void subtract(final DatanodeDescriptor node) {
>   capacityUsed -= node.getDfsUsed();
>   blockPoolUsed -= node.getBlockPoolUsed();
>   xceiverCount -= node.getXceiverCount();
> {color:red}  if (!(node.isDecommissionInProgress() || 
> node.isDecommissioned())) {{color}
> nodesInService--;
> nodesInServiceXceiverCount -= node.getXceiverCount();
> capacityTotal -= node.getCapacity();
> capacityRemaining -= node.getRemaining();
>   } else {
> capacityTotal -= node.getDfsUsed();
>   }
>   cacheCapacity -= node.getCacheCapacity();
>   cacheUsed -= node.getCacheUsed();
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure

2017-11-20 Thread Gang Xie (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258974#comment-16258974
 ] 

Gang Xie edited comment on HDFS-12820 at 11/20/17 9:11 AM:
---

After carefully check the issue reported in HDFS-9279, and found this issue is 
not its dup.

this case is mainly about the param {color:#d04437}nodesInService{color}. When 
a datanode is decommissioned and then dead. nodesInService will not be 
subtracted.  
Then when allocation, the dead node will be counted in the maxload, which makes 
the maxload very low, in turns, causes any allocation failing.

if (considerLoad) {
{color:#d04437}  final double maxLoad = maxLoadRatio * 
stats.getInServiceXceiverAverage();{color}
  final int nodeLoad = node.getXceiverCount();
  if (nodeLoad > maxLoad) {
logNodeIsNotChosen(storage,
"the node is too busy (load:"+nodeLoad+" > "+maxLoad+") ");
stats.incrOverLoaded();
return false;
  }
}


was (Author: xiegang112):
After carefully check the issue reported in HDFS-9279, and found this issue is 
not a its dup.

this case is mainly about the param {color:#d04437}nodesInService{color}. When 
a datanode is decommissioned and then dead. nodesInService will not be 
subtracted.  
Then when allocation, the dead node will be counted in the maxload, which makes 
the maxload very low, in turns, causes any allocation failing.

if (considerLoad) {
{color:#d04437}  final double maxLoad = maxLoadRatio * 
stats.getInServiceXceiverAverage();{color}
  final int nodeLoad = node.getXceiverCount();
  if (nodeLoad > maxLoad) {
logNodeIsNotChosen(storage,
"the node is too busy (load:"+nodeLoad+" > "+maxLoad+") ");
stats.incrOverLoaded();
return false;
  }
}

> Decommissioned datanode is counted in service cause datanode allcating failure
> --
>
> Key: HDFS-12820
> URL: https://issues.apache.org/jira/browse/HDFS-12820
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement
>Affects Versions: 2.4.0
>Reporter: Gang Xie
>
> When allocate a datanode when dfsclient write with considering the load, it 
> checks if the datanode is overloaded by calculating the average xceivers of 
> all the in service datanode. But if the datanode is decommissioned and become 
> dead, it's still treated as in service, which make the average load much more 
> than the real one especially when the number of the decommissioned datanode 
> is great. In our cluster, 180 datanode, and 100 of them decommissioned, and 
> the average load is 17. This failed all the datanode allocation. 
> private void subtract(final DatanodeDescriptor node) {
>   capacityUsed -= node.getDfsUsed();
>   blockPoolUsed -= node.getBlockPoolUsed();
>   xceiverCount -= node.getXceiverCount();
> {color:red}  if (!(node.isDecommissionInProgress() || 
> node.isDecommissioned())) {{color}
> nodesInService--;
> nodesInServiceXceiverCount -= node.getXceiverCount();
> capacityTotal -= node.getCapacity();
> capacityRemaining -= node.getRemaining();
>   } else {
> capacityTotal -= node.getDfsUsed();
>   }
>   cacheCapacity -= node.getCacheCapacity();
>   cacheUsed -= node.getCacheUsed();
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure

2017-11-20 Thread Gang Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Xie reopened HDFS-12820:
-

After carefully check the issue reported in HDFS-9279, and found this issue is 
not a its dup.

this case is mainly about the param {color:#d04437}nodesInService{color}. When 
a datanode is decommissioned and then dead. nodesInService will not be 
subtracted.  
Then when allocation, the dead node will be counted in the maxload, which makes 
the maxload very low, in turns, causes any allocation failing.

if (considerLoad) {
{color:#d04437}  final double maxLoad = maxLoadRatio * 
stats.getInServiceXceiverAverage();{color}
  final int nodeLoad = node.getXceiverCount();
  if (nodeLoad > maxLoad) {
logNodeIsNotChosen(storage,
"the node is too busy (load:"+nodeLoad+" > "+maxLoad+") ");
stats.incrOverLoaded();
return false;
  }
}

> Decommissioned datanode is counted in service cause datanode allcating failure
> --
>
> Key: HDFS-12820
> URL: https://issues.apache.org/jira/browse/HDFS-12820
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement
>Affects Versions: 2.4.0
>Reporter: Gang Xie
>
> When allocate a datanode when dfsclient write with considering the load, it 
> checks if the datanode is overloaded by calculating the average xceivers of 
> all the in service datanode. But if the datanode is decommissioned and become 
> dead, it's still treated as in service, which make the average load much more 
> than the real one especially when the number of the decommissioned datanode 
> is great. In our cluster, 180 datanode, and 100 of them decommissioned, and 
> the average load is 17. This failed all the datanode allocation. 
> private void subtract(final DatanodeDescriptor node) {
>   capacityUsed -= node.getDfsUsed();
>   blockPoolUsed -= node.getBlockPoolUsed();
>   xceiverCount -= node.getXceiverCount();
> {color:red}  if (!(node.isDecommissionInProgress() || 
> node.isDecommissioned())) {{color}
> nodesInService--;
> nodesInServiceXceiverCount -= node.getXceiverCount();
> capacityTotal -= node.getCapacity();
> capacityRemaining -= node.getRemaining();
>   } else {
> capacityTotal -= node.getDfsUsed();
>   }
>   cacheCapacity -= node.getCacheCapacity();
>   cacheUsed -= node.getCacheUsed();
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure

2017-11-20 Thread Gang Xie (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258975#comment-16258975
 ] 

Gang Xie commented on HDFS-12820:
-

And I believe this issue still in the latest version

> Decommissioned datanode is counted in service cause datanode allcating failure
> --
>
> Key: HDFS-12820
> URL: https://issues.apache.org/jira/browse/HDFS-12820
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement
>Affects Versions: 2.4.0
>Reporter: Gang Xie
>
> When allocate a datanode when dfsclient write with considering the load, it 
> checks if the datanode is overloaded by calculating the average xceivers of 
> all the in service datanode. But if the datanode is decommissioned and become 
> dead, it's still treated as in service, which make the average load much more 
> than the real one especially when the number of the decommissioned datanode 
> is great. In our cluster, 180 datanode, and 100 of them decommissioned, and 
> the average load is 17. This failed all the datanode allocation. 
> private void subtract(final DatanodeDescriptor node) {
>   capacityUsed -= node.getDfsUsed();
>   blockPoolUsed -= node.getBlockPoolUsed();
>   xceiverCount -= node.getXceiverCount();
> {color:red}  if (!(node.isDecommissionInProgress() || 
> node.isDecommissioned())) {{color}
> nodesInService--;
> nodesInServiceXceiverCount -= node.getXceiverCount();
> capacityTotal -= node.getCapacity();
> capacityRemaining -= node.getRemaining();
>   } else {
> capacityTotal -= node.getDfsUsed();
>   }
>   cacheCapacity -= node.getCacheCapacity();
>   cacheUsed -= node.getCacheUsed();
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12740) SCM should support a RPC to share the cluster Id with KSM and DataNodes

2017-11-20 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-12740:
---
Attachment: HDFS-12740-HDFS-7240.004.patch

Rebased patch v3. 

> SCM should support a RPC to share the cluster Id with KSM and DataNodes
> ---
>
> Key: HDFS-12740
> URL: https://issues.apache.org/jira/browse/HDFS-12740
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
> Fix For: HDFS-7240
>
> Attachments: HDFS-12740-HDFS-7240.001.patch, 
> HDFS-12740-HDFS-7240.002.patch, HDFS-12740-HDFS-7240.003.patch, 
> HDFS-12740-HDFS-7240.004.patch
>
>
> When the ozone cluster is first Created, SCM --init command will generate 
> cluster Id as well as SCM Id and persist it locally. The same cluster Id and 
> the SCM id will be shared with KSM during the KSM initialization and 
> Datanodes during datanode registration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

62 matches

Mail list logo