[jira] [Commented] (HDFS-13369) FSCK Report broken with RequestHedgingProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17616774#comment-17616774 ] Sammi Chen commented on HDFS-13369: --- Cherry-picked from trunk to branch-3.3.5. > FSCK Report broken with RequestHedgingProxyProvider > > > Key: HDFS-13369 > URL: https://issues.apache.org/jira/browse/HDFS-13369 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.3, 3.0.0, 3.1.0 >Reporter: Harshakiran Reddy >Assignee: Ranith Sardar >Priority: Major > Labels: pull-request-available > Fix For: 3.3.5 > > Attachments: HDFS-13369.001.patch, HDFS-13369.002.patch, > HDFS-13369.003.patch, HDFS-13369.004.patch, HDFS-13369.005.patch, > HDFS-13369.006.patch, HDFS-13369.007.patch > > > Scenario:- > 1.Configure the RequestHedgingProxy > 2. write some files in file system > 3. Take FSCK report for the above files > > {noformat} > bin> hdfs fsck /file1 -locations -files -blocks > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider$RequestHedgingInvocationHandler > cannot be cast to org.apache.hadoop.ipc.RpcInvocationHandler > at org.apache.hadoop.ipc.RPC.getConnectionIdForProxy(RPC.java:626) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.getConnectionId(RetryInvocationHandler.java:438) > at org.apache.hadoop.ipc.RPC.getConnectionIdForProxy(RPC.java:628) > at org.apache.hadoop.ipc.RPC.getServerAddress(RPC.java:611) > at org.apache.hadoop.hdfs.HAUtil.getAddressOfActive(HAUtil.java:263) > at > org.apache.hadoop.hdfs.tools.DFSck.getCurrentNamenodeAddress(DFSck.java:257) > at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:319) > at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:156) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:153) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:152) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:385){noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13369) FSCK Report broken with RequestHedgingProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDFS-13369: -- Fix Version/s: 3.3.5 Resolution: Fixed Status: Resolved (was: Patch Available) > FSCK Report broken with RequestHedgingProxyProvider > > > Key: HDFS-13369 > URL: https://issues.apache.org/jira/browse/HDFS-13369 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.3, 3.0.0, 3.1.0 >Reporter: Harshakiran Reddy >Assignee: Ranith Sardar >Priority: Major > Labels: pull-request-available > Fix For: 3.3.5 > > Attachments: HDFS-13369.001.patch, HDFS-13369.002.patch, > HDFS-13369.003.patch, HDFS-13369.004.patch, HDFS-13369.005.patch, > HDFS-13369.006.patch, HDFS-13369.007.patch > > > Scenario:- > 1.Configure the RequestHedgingProxy > 2. write some files in file system > 3. Take FSCK report for the above files > > {noformat} > bin> hdfs fsck /file1 -locations -files -blocks > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider$RequestHedgingInvocationHandler > cannot be cast to org.apache.hadoop.ipc.RpcInvocationHandler > at org.apache.hadoop.ipc.RPC.getConnectionIdForProxy(RPC.java:626) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.getConnectionId(RetryInvocationHandler.java:438) > at org.apache.hadoop.ipc.RPC.getConnectionIdForProxy(RPC.java:628) > at org.apache.hadoop.ipc.RPC.getServerAddress(RPC.java:611) > at org.apache.hadoop.hdfs.HAUtil.getAddressOfActive(HAUtil.java:263) > at > org.apache.hadoop.hdfs.tools.DFSck.getCurrentNamenodeAddress(DFSck.java:257) > at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:319) > at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:156) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:153) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:152) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:385){noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13369) FSCK Report broken with RequestHedgingProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614866#comment-17614866 ] Sammi Chen commented on HDFS-13369: --- Hi @navinko, could you submit same PR for branch 2.8.0, 3.0.0, 3.1.0 too? > FSCK Report broken with RequestHedgingProxyProvider > > > Key: HDFS-13369 > URL: https://issues.apache.org/jira/browse/HDFS-13369 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.3, 3.0.0, 3.1.0 >Reporter: Harshakiran Reddy >Assignee: Ranith Sardar >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13369.001.patch, HDFS-13369.002.patch, > HDFS-13369.003.patch, HDFS-13369.004.patch, HDFS-13369.005.patch, > HDFS-13369.006.patch, HDFS-13369.007.patch > > > Scenario:- > 1.Configure the RequestHedgingProxy > 2. write some files in file system > 3. Take FSCK report for the above files > > {noformat} > bin> hdfs fsck /file1 -locations -files -blocks > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider$RequestHedgingInvocationHandler > cannot be cast to org.apache.hadoop.ipc.RpcInvocationHandler > at org.apache.hadoop.ipc.RPC.getConnectionIdForProxy(RPC.java:626) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.getConnectionId(RetryInvocationHandler.java:438) > at org.apache.hadoop.ipc.RPC.getConnectionIdForProxy(RPC.java:628) > at org.apache.hadoop.ipc.RPC.getServerAddress(RPC.java:611) > at org.apache.hadoop.hdfs.HAUtil.getAddressOfActive(HAUtil.java:263) > at > org.apache.hadoop.hdfs.tools.DFSck.getCurrentNamenodeAddress(DFSck.java:257) > at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:319) > at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:156) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:153) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:152) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:385){noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2602) Add a property to enable/disable ONE replica pipeline auto creation in SCMPipelineManager
[ https://issues.apache.org/jira/browse/HDDS-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen reassigned HDDS-2602: Assignee: Li Cheng > Add a property to enable/disable ONE replica pipeline auto creation in > SCMPipelineManager > - > > Key: HDDS-2602 > URL: https://issues.apache.org/jira/browse/HDDS-2602 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Sammi Chen >Assignee: Li Cheng >Priority: Major > > ONE RATIS replica not favored in production cluster. Add a property to > disable automatically create ONE RATIS pipeline in SCMPipelineManager -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2602) Add a property to enable/disable ONE replica pipeline auto creation in SCMPipelineManager
Sammi Chen created HDDS-2602: Summary: Add a property to enable/disable ONE replica pipeline auto creation in SCMPipelineManager Key: HDDS-2602 URL: https://issues.apache.org/jira/browse/HDDS-2602 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Sammi Chen ONE RATIS replica not favored in production cluster. Add a property to disable automatically create ONE RATIS pipeline in SCMPipelineManager -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2540) Fix accepetance test failure introduced by wait_for_safemode_exit
Sammi Chen created HDDS-2540: Summary: Fix accepetance test failure introduced by wait_for_safemode_exit Key: HDDS-2540 URL: https://issues.apache.org/jira/browse/HDDS-2540 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Sammi Chen Assignee: Sammi Chen https://github.com/apache/hadoop-ozone/blob/1b72718dcab7f83ebdac67b6242c729f03a8f103/hadoop-ozone/dist/src/main/compose/testlib.sh#L97 - status=`docker-compose -f "${compose_file}" exec -T scm bash -c "kinit -k HTTP/s...@example.com -t /etc/security/keytabs/HTTP.keytab && $command'"` + status=`docker-compose -f "${compose_file}" exec -T scm bash -c "kinit -k HTTP/s...@example.com -t /etc/security/keytabs/HTTP.keytab && $command"` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2499) IsLeader information is lost when update pipeline state
[ https://issues.apache.org/jira/browse/HDDS-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen resolved HDDS-2499. -- Fix Version/s: 0.5.0 Resolution: Fixed > IsLeader information is lost when update pipeline state > --- > > Key: HDDS-2499 > URL: https://issues.apache.org/jira/browse/HDDS-2499 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2499) IsLeader information is lost when update pipeline state
Sammi Chen created HDDS-2499: Summary: IsLeader information is lost when update pipeline state Key: HDDS-2499 URL: https://issues.apache.org/jira/browse/HDDS-2499 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Sammi Chen Assignee: Sammi Chen -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-2249) SortDatanodes does not return correct orders when many DNs on a given host
[ https://issues.apache.org/jira/browse/HDDS-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974047#comment-16974047 ] Sammi Chen edited comment on HDDS-2249 at 11/14/19 8:47 AM: Thanks [~swagle] for report this. One idea comes to my mind is how about use the the hostname:port as the key in dnsToUuidMap. If it works, it might solve this issue. was (Author: sammi): Thanks [~swagle] for report this. One idea comes to my mind is how about use the the hostname:port as the key in dnsToUuidMap. If it works, will it solve this issue? > SortDatanodes does not return correct orders when many DNs on a given host > -- > > Key: HDDS-2249 > URL: https://issues.apache.org/jira/browse/HDDS-2249 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.5.0 >Reporter: Stephen O'Donnell >Priority: Major > > In HDDS-2199 ScmNodeManager.getNodeByAddress() was changed to return a list > of nodes rather than a single entry, to handle the case where many datanodes > are running on the same host. > In SCMBlocKProtocol.sortDatanodes(), it uses the results returned from > getNodesByAddress to determine if the client submitting the request is > running on a cluster node, and if it is, it attempts to sort the datanodes by > distance from the client machine. > To do this, the code currently takes the first DatanodeDetails object > returned by getHostsByAddress and then compares it with the other passed in > nodes. If any of the passed nodes are equal to the client node (based on the > Java object ID) it returns a zero distance, otherwise the distance is > calculated. > The sort is performed in NetworkTopologyImpl.sortByDistanceCost() which later > calls NetworkTopologyImpl.getDistanceCost() which is where the object > comparison is performed: > {code} > if ((node1 != null && node2 != null && node1.equals(node2)) || > (node1 == null && node2 == null)) { > return 0; > } > {code} > This does not always work when there are many datanodes on the same host, as > the first node returned from getNodesByAddress() is guarantted to be on the > same host as the client, but the list of passed datanodes may not include > that datanode instance. > To fix this, we should probably have getDistanceCost() compare hostnames or > IP as a second check or instead of the object equality, however this is not > trivial to implement. > The reason, is that getDistanceCost() takes Node objects (not > DatanodeDetails) and a Node does not have a IP or Hostname field. It does > have a getNetworkName method, which should return the hostname, but it is > overwritten by the hosts UUID when it registed to the node manager, by this > line in NodeManager.register(): > datanodeDetails.setNetworkName(datanodeDetails.getUuidString()); > > Note this only affects test clusters where many DNs are on a single host, and > it does not cause any failures. The DNs may be returned a less than ideal > order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2249) SortDatanodes does not return correct orders when many DNs on a given host
[ https://issues.apache.org/jira/browse/HDDS-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974047#comment-16974047 ] Sammi Chen commented on HDDS-2249: -- Thanks [~swagle] for report this. One idea comes to my mind is how about use the the hostname:port as the key in dnsToUuidMap. If it works, will it solve this issue? > SortDatanodes does not return correct orders when many DNs on a given host > -- > > Key: HDDS-2249 > URL: https://issues.apache.org/jira/browse/HDDS-2249 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.5.0 >Reporter: Stephen O'Donnell >Priority: Major > > In HDDS-2199 ScmNodeManager.getNodeByAddress() was changed to return a list > of nodes rather than a single entry, to handle the case where many datanodes > are running on the same host. > In SCMBlocKProtocol.sortDatanodes(), it uses the results returned from > getNodesByAddress to determine if the client submitting the request is > running on a cluster node, and if it is, it attempts to sort the datanodes by > distance from the client machine. > To do this, the code currently takes the first DatanodeDetails object > returned by getHostsByAddress and then compares it with the other passed in > nodes. If any of the passed nodes are equal to the client node (based on the > Java object ID) it returns a zero distance, otherwise the distance is > calculated. > The sort is performed in NetworkTopologyImpl.sortByDistanceCost() which later > calls NetworkTopologyImpl.getDistanceCost() which is where the object > comparison is performed: > {code} > if ((node1 != null && node2 != null && node1.equals(node2)) || > (node1 == null && node2 == null)) { > return 0; > } > {code} > This does not always work when there are many datanodes on the same host, as > the first node returned from getNodesByAddress() is guarantted to be on the > same host as the client, but the list of passed datanodes may not include > that datanode instance. > To fix this, we should probably have getDistanceCost() compare hostnames or > IP as a second check or instead of the object equality, however this is not > trivial to implement. > The reason, is that getDistanceCost() takes Node objects (not > DatanodeDetails) and a Node does not have a IP or Hostname field. It does > have a getNetworkName method, which should return the hostname, but it is > overwritten by the hosts UUID when it registed to the node manager, by this > line in NodeManager.register(): > datanodeDetails.setNetworkName(datanodeDetails.getUuidString()); > > Note this only affects test clusters where many DNs are on a single host, and > it does not cause any failures. The DNs may be returned a less than ideal > order. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971357#comment-16971357 ] Sammi Chen commented on HDDS-2356: -- Hi [~bharat], thanks for helping fix the multi-upload issues. Li and I am working on enable Ozone in Tencent's production environment. Currenlty we have two main blocking issues, one is this multi-upload, another is performance. Mukul and Shashi are helping us with the performance improvement. This multi-upload issue is consistenly happen in our environement with big file with size, say 5GB. It will be more efficient if you would try to reproduce the case locally. We would love to assit if you need any reproduce help. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, > image-2019-10-31-18-56-56-177.png, om_audit_log_plc_1570863541668_9278.txt > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(Ozo
[jira] [Updated] (HDDS-1576) Support configure more than one raft log storage to host multiple pipelines
[ https://issues.apache.org/jira/browse/HDDS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1576: - Summary: Support configure more than one raft log storage to host multiple pipelines (was: Support configure more than one raft log storage to host multiple pipeline) > Support configure more than one raft log storage to host multiple pipelines > --- > > Key: HDDS-1576 > URL: https://issues.apache.org/jira/browse/HDDS-1576 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM >Reporter: Siddharth Wagle >Assignee: Sammi Chen >Priority: Major > > Support configure multiple raft log storage to host multiple THREE factor > RATIS pipelines. > Unless the storage is a fast media, datanode should try best to allocate > different raft log storage for new pipeline. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1576) Support configure more than one raft log storage to host multiple pipeline
[ https://issues.apache.org/jira/browse/HDDS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1576: - Description: Support configure multiple raft log storage to host multiple THREE factor RATIS pipelines. Unless the storage is a fast media, datanode should try best to allocate different raft log storage for new pipeline. was: Support configure multiple raft SCM should not try to create a raft group by placing the raft log on a disk that is already used by existing Ratis ring for an open pipeline. This constraint would have to be applied by either throwing an exception during pipeline creation or by looking at configs on the SCM side. Ensure constraint of one raft log per disk is met unless fast media > Support configure more than one raft log storage to host multiple pipeline > -- > > Key: HDDS-1576 > URL: https://issues.apache.org/jira/browse/HDDS-1576 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM >Reporter: Siddharth Wagle >Assignee: Sammi Chen >Priority: Major > > Support configure multiple raft log storage to host multiple THREE factor > RATIS pipelines. > Unless the storage is a fast media, datanode should try best to allocate > different raft log storage for new pipeline. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1576) Support configure more than one raft storage to host multiple pipeline
[ https://issues.apache.org/jira/browse/HDDS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1576: - Description: Support configure multiple raft SCM should not try to create a raft group by placing the raft log on a disk that is already used by existing Ratis ring for an open pipeline. This constraint would have to be applied by either throwing an exception during pipeline creation or by looking at configs on the SCM side. Ensure constraint of one raft log per disk is met unless fast media was: SCM should not try to create a raft group by placing the raft log on a disk that is already used by existing Ratis ring for an open pipeline. This constraint would have to be applied by either throwing an exception during pipeline creation or by looking at configs on the SCM side. Ensure constraint of one raft log per disk is met unless fast media > Support configure more than one raft storage to host multiple pipeline > -- > > Key: HDDS-1576 > URL: https://issues.apache.org/jira/browse/HDDS-1576 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM >Reporter: Siddharth Wagle >Assignee: Sammi Chen >Priority: Major > > Support configure multiple raft > SCM should not try to create a raft group by placing the raft log on a disk > that is already used by existing Ratis ring for an open pipeline. > This constraint would have to be applied by either throwing an exception > during pipeline creation or by looking at configs on the SCM side. > Ensure constraint of one raft log per disk is met unless fast media -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1576) Support configure more than one raft log storage to host multiple pipeline
[ https://issues.apache.org/jira/browse/HDDS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1576: - Summary: Support configure more than one raft log storage to host multiple pipeline (was: Support configure more than one raft storage to host multiple pipeline) > Support configure more than one raft log storage to host multiple pipeline > -- > > Key: HDDS-1576 > URL: https://issues.apache.org/jira/browse/HDDS-1576 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM >Reporter: Siddharth Wagle >Assignee: Sammi Chen >Priority: Major > > Support configure multiple raft > SCM should not try to create a raft group by placing the raft log on a disk > that is already used by existing Ratis ring for an open pipeline. > This constraint would have to be applied by either throwing an exception > during pipeline creation or by looking at configs on the SCM side. > Ensure constraint of one raft log per disk is met unless fast media -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1576) Support configure more than one raft storage to host multiple pipeline
[ https://issues.apache.org/jira/browse/HDDS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1576: - Summary: Support configure more than one raft storage to host multiple pipeline (was: Ensure constraint of one raft log per disk is met unless fast media) > Support configure more than one raft storage to host multiple pipeline > -- > > Key: HDDS-1576 > URL: https://issues.apache.org/jira/browse/HDDS-1576 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM >Reporter: Siddharth Wagle >Assignee: Sammi Chen >Priority: Major > > SCM should not try to create a raft group by placing the raft log on a disk > that is already used by existing Ratis ring for an open pipeline. > This constraint would have to be applied by either throwing an exception > during pipeline creation or by looking at configs on the SCM side. > Ensure constraint of one raft log per disk is met unless fast media -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1576) Ensure constraint of one raft log per disk is met unless fast media
[ https://issues.apache.org/jira/browse/HDDS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1576: - Description: SCM should not try to create a raft group by placing the raft log on a disk that is already used by existing Ratis ring for an open pipeline. This constraint would have to be applied by either throwing an exception during pipeline creation or by looking at configs on the SCM side. Ensure constraint of one raft log per disk is met unless fast media was: SCM should not try to create a raft group by placing the raft log on a disk that is already used by existing Ratis ring for an open pipeline. This constraint would have to be applied by either throwing an exception during pipeline creation or by looking at configs on the SCM side. > Ensure constraint of one raft log per disk is met unless fast media > --- > > Key: HDDS-1576 > URL: https://issues.apache.org/jira/browse/HDDS-1576 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM >Reporter: Siddharth Wagle >Assignee: Sammi Chen >Priority: Major > > SCM should not try to create a raft group by placing the raft log on a disk > that is already used by existing Ratis ring for an open pipeline. > This constraint would have to be applied by either throwing an exception > during pipeline creation or by looking at configs on the SCM side. > Ensure constraint of one raft log per disk is met unless fast media -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1576) Ensure constraint of one raft log per disk is met unless fast media
[ https://issues.apache.org/jira/browse/HDDS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen reassigned HDDS-1576: Assignee: Sammi Chen (was: Li Cheng) > Ensure constraint of one raft log per disk is met unless fast media > --- > > Key: HDDS-1576 > URL: https://issues.apache.org/jira/browse/HDDS-1576 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM >Reporter: Siddharth Wagle >Assignee: Sammi Chen >Priority: Major > > SCM should not try to create a raft group by placing the raft log on a disk > that is already used by existing Ratis ring for an open pipeline. > This constraint would have to be applied by either throwing an exception > during pipeline creation or by looking at configs on the SCM side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2376) Fail to read data through XceiverClientGrpc
[ https://issues.apache.org/jira/browse/HDDS-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen resolved HDDS-2376. -- Resolution: Not A Bug > Fail to read data through XceiverClientGrpc > --- > > Key: HDDS-2376 > URL: https://issues.apache.org/jira/browse/HDDS-2376 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Sammi Chen >Assignee: Hanisha Koneru >Priority: Blocker > > Run teragen, application failed with following stack, > 19/10/29 14:35:42 INFO mapreduce.Job: Running job: job_1567133159094_0048 > 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 running in > uber mode : false > 19/10/29 14:35:59 INFO mapreduce.Job: map 0% reduce 0% > 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 failed with > state FAILED due to: Application application_1567133159094_0048 failed 2 > times due to AM Container for appattempt_1567133159094_0048_02 exited > with exitCode: -1000 > For more detailed output, check application tracking > page:http://host183:8088/cluster/app/application_1567133159094_0048Then, > click on links to logs of each attempt. > Diagnostics: Unexpected OzoneException: > org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at > index 0 > java.io.IOException: Unexpected OzoneException: > org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at > index 0 > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144) > at > org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239) > at > org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171) > at > org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:86) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:60) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:120) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366) > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum > mismatch at index 0 > at > org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148) > at > org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275) > at > org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:250) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:233) > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:245) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:335) > ... 26 more > Caused by: Checksum mismatch at index 0 > org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at > index 0 > at > org.apache.hadoop.
[jira] [Comment Edited] (HDDS-2376) Fail to read data through XceiverClientGrpc
[ https://issues.apache.org/jira/browse/HDDS-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964661#comment-16964661 ] Sammi Chen edited comment on HDDS-2376 at 11/1/19 6:56 AM: --- The root cause is I didn't retart Hadoop 2.7.5 after I deploied the latest Ozone binary. So the Hadoop still use an old version Ozone client(2 month before) . This OzoneChecksumException is thrown out by NodeManager. Logs attached. It seems something is changed in Ozone server side, which makes an old version Ozone client cann't verify the data written by itself. [~msingh] and [~hanishakoneru], thanks for pay attention to this issue. I will close it now. 2019-11-01 11:46:02,230 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc: Failed to execute command cmdType: ReadChunk traceID: "" containerID: 1145 datanodeUuid: "ed90869c-317e-4303-8922-9fa83a3983cb" readChunk { blockID { containerID: 1145 localID: 103060600027086850 blockCommitSequenceId: 948 } chunkData { chunkName: "103060600027086850_chunk_1" offset: 0 len: 245 checksumData { type: CRC32 bytesPerChecksum: 1048576 checksums: "\247\304Yf" } } } on datanode 1da74a1d-f64d-4ad4-b04c-85f26687e683 org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at index 0 at org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148) at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275) at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:250) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:233) at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:245) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:335) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239) at org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171) at org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:86) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:60) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:120) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-11-01 11:46:02,243 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc: Failed to execute command cmdType: ReadChunk traceID: "" containerID: 1145 datanodeUuid: "ed90869c-317e-4303-8922-9fa83a3983cb" readChunk { blockID { containerID: 1145 localID: 103060600027086850 blockCommitSequenceId: 948 } chunkData { chunkName: "103060600027086850_chunk_1" offset: 0 len: 245 checksumData { type: CRC32 bytesPerChecksum: 1048576 checksums: "\247\304Yf" } } } on datanode ed90869c-317e-4303-8922-9fa83a3983cb org.apache.hadoop.ozone.common.OzoneCh
[jira] [Commented] (HDDS-2376) Fail to read data through XceiverClientGrpc
[ https://issues.apache.org/jira/browse/HDDS-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964661#comment-16964661 ] Sammi Chen commented on HDDS-2376: -- The root cause is I didn't retart Hadoop 2.7.5 after I deploied the latest Ozone binary. So the Hadoop still use an old version Ozone client(2 month before) . This OzoneChecksumException is thrown out by NodeManager. Logs attached. It seems something is changed in Ozone server side, which makes an old version Ozone client cann't verify the data written by itself. 2019-11-01 11:46:02,230 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc: Failed to execute command cmdType: ReadChunk traceID: "" containerID: 1145 datanodeUuid: "ed90869c-317e-4303-8922-9fa83a3983cb" readChunk { blockID { containerID: 1145 localID: 103060600027086850 blockCommitSequenceId: 948 } chunkData { chunkName: "103060600027086850_chunk_1" offset: 0 len: 245 checksumData { type: CRC32 bytesPerChecksum: 1048576 checksums: "\247\304Yf" } } } on datanode 1da74a1d-f64d-4ad4-b04c-85f26687e683 org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at index 0 at org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148) at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275) at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:250) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:233) at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:245) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:335) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239) at org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171) at org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:86) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:60) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:120) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-11-01 11:46:02,243 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc: Failed to execute command cmdType: ReadChunk traceID: "" containerID: 1145 datanodeUuid: "ed90869c-317e-4303-8922-9fa83a3983cb" readChunk { blockID { containerID: 1145 localID: 103060600027086850 blockCommitSequenceId: 948 } chunkData { chunkName: "103060600027086850_chunk_1" offset: 0 len: 245 checksumData { type: CRC32 bytesPerChecksum: 1048576 checksums: "\247\304Yf" } } } on datanode ed90869c-317e-4303-8922-9fa83a3983cb org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at index 0 at org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.ja
[jira] [Updated] (HDDS-2363) Failed to create Ratis container
[ https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2363: - Fix Version/s: 0.5.0 > Failed to create Ratis container > > > Key: HDDS-2363 > URL: https://issues.apache.org/jira/browse/HDDS-2363 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Blocker > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Error logs; > 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR - > org.rocksdb.RocksDBException Failed init RocksDB, db path : > /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db, > exception > :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db: > does not exist (create_if_missing is false) > CACHED_OPTS is a RockDB options cache in MetadataStoreBuilder. The cache > keeps the old rocksdb options which is not refreshed with new option values > at new call. > Logs as following didn't reveal the true failure of write failure. Will > improve following logs too. > 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: > CreateContainer : Trace ID: : Message: Container creation failed. : Result: > CONTAINER_INTERNAL_ERROR > 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk > : Trace ID: : Message: ContainerID 402 creation failed : Result: > CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2363) Failed to create Ratis container
[ https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen resolved HDDS-2363. -- Resolution: Fixed > Failed to create Ratis container > > > Key: HDDS-2363 > URL: https://issues.apache.org/jira/browse/HDDS-2363 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Blocker > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Error logs; > 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR - > org.rocksdb.RocksDBException Failed init RocksDB, db path : > /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db, > exception > :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db: > does not exist (create_if_missing is false) > CACHED_OPTS is a RockDB options cache in MetadataStoreBuilder. The cache > keeps the old rocksdb options which is not refreshed with new option values > at new call. > Logs as following didn't reveal the true failure of write failure. Will > improve following logs too. > 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: > CreateContainer : Trace ID: : Message: Container creation failed. : Result: > CONTAINER_INTERNAL_ERROR > 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk > : Trace ID: : Message: ContainerID 402 creation failed : Result: > CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2363) Failed to create Ratis container
[ https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2363: - Summary: Failed to create Ratis container (was: Fail to create Ratis container) > Failed to create Ratis container > > > Key: HDDS-2363 > URL: https://issues.apache.org/jira/browse/HDDS-2363 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Blocker > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Error logs; > 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR - > org.rocksdb.RocksDBException Failed init RocksDB, db path : > /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db, > exception > :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db: > does not exist (create_if_missing is false) > CACHED_OPTS is a RockDB options cache in MetadataStoreBuilder. The cache > keeps the old rocksdb options which is not refreshed with new option values > at new call. > Logs as following didn't reveal the true failure of write failure. Will > improve following logs too. > 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: > CreateContainer : Trace ID: : Message: Container creation failed. : Result: > CONTAINER_INTERNAL_ERROR > 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk > : Trace ID: : Message: ContainerID 402 creation failed : Result: > CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2376) Fail to read data through XceiverClientGrpc
[ https://issues.apache.org/jira/browse/HDDS-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963727#comment-16963727 ] Sammi Chen commented on HDDS-2376: -- Hi [~msingh] and [~hanishakoneru], I don't find any WARN or ERROR logs on om, scm and datanodes. I will add more logs to collect more info. > Fail to read data through XceiverClientGrpc > --- > > Key: HDDS-2376 > URL: https://issues.apache.org/jira/browse/HDDS-2376 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Sammi Chen >Assignee: Hanisha Koneru >Priority: Blocker > > Run teragen, application failed with following stack, > 19/10/29 14:35:42 INFO mapreduce.Job: Running job: job_1567133159094_0048 > 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 running in > uber mode : false > 19/10/29 14:35:59 INFO mapreduce.Job: map 0% reduce 0% > 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 failed with > state FAILED due to: Application application_1567133159094_0048 failed 2 > times due to AM Container for appattempt_1567133159094_0048_02 exited > with exitCode: -1000 > For more detailed output, check application tracking > page:http://host183:8088/cluster/app/application_1567133159094_0048Then, > click on links to logs of each attempt. > Diagnostics: Unexpected OzoneException: > org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at > index 0 > java.io.IOException: Unexpected OzoneException: > org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at > index 0 > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144) > at > org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239) > at > org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171) > at > org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:86) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:60) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:120) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366) > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum > mismatch at index 0 > at > org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148) > at > org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275) > at > org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:250) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:233) > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:245) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:335) > .
[jira] [Updated] (HDDS-2376) Fail to read data through XceiverClientGrpc
[ https://issues.apache.org/jira/browse/HDDS-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2376: - Description: Run teragen, application failed with following stack, 19/10/29 14:35:42 INFO mapreduce.Job: Running job: job_1567133159094_0048 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 running in uber mode : false 19/10/29 14:35:59 INFO mapreduce.Job: map 0% reduce 0% 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 failed with state FAILED due to: Application application_1567133159094_0048 failed 2 times due to AM Container for appattempt_1567133159094_0048_02 exited with exitCode: -1000 For more detailed output, check application tracking page:http://host183:8088/cluster/app/application_1567133159094_0048Then, click on links to logs of each attempt. Diagnostics: Unexpected OzoneException: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at index 0 java.io.IOException: Unexpected OzoneException: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at index 0 at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239) at org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171) at org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:86) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:60) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:120) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at index 0 at org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148) at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275) at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:250) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:233) at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:245) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:335) ... 26 more Caused by: Checksum mismatch at index 0 org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at index 0 at org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148) at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275) at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287) at org.apache.h
[jira] [Created] (HDDS-2376) Fail to read data through XceiverClientGrpc
Sammi Chen created HDDS-2376: Summary: Fail to read data through XceiverClientGrpc Key: HDDS-2376 URL: https://issues.apache.org/jira/browse/HDDS-2376 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Sammi Chen Run teragen, application failed with following stack, 19/10/29 14:35:42 INFO mapreduce.Job: Running job: job_1567133159094_0048 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 running in uber mode : false 19/10/29 14:35:59 INFO mapreduce.Job: map 0% reduce 0% 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 failed with state FAILED due to: Application application_1567133159094_0048 failed 2 times due to AM Container for appattempt_1567133159094_0048_02 exited with exitCode: -1000 For more detailed output, check application tracking page:http://host183:8088/cluster/app/application_1567133159094_0048Then, click on links to logs of each attempt. Diagnostics: Unexpected OzoneException: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at index 0 java.io.IOException: Unexpected OzoneException: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at index 0 at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239) at org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171) at org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:86) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:60) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:120) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at index 0 at org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148) at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275) at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:250) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:233) at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:245) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:335) ... 26 more Caused by: Checksum mismatch at index 0 org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at index 0 at org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148) at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275) at org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.j
[jira] [Updated] (HDDS-2363) Fail to create Ratis container
[ https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2363: - Priority: Blocker (was: Critical) > Fail to create Ratis container > -- > > Key: HDDS-2363 > URL: https://issues.apache.org/jira/browse/HDDS-2363 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Blocker > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Error logs; > 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR - > org.rocksdb.RocksDBException Failed init RocksDB, db path : > /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db, > exception > :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db: > does not exist (create_if_missing is false) > CACHED_OPTS is a RockDB options cache in MetadataStoreBuilder. The cache > keeps the old rocksdb options which is not refreshed with new option values > at new call. > Logs as following didn't reveal the true failure of write failure. Will > improve following logs too. > 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: > CreateContainer : Trace ID: : Message: Container creation failed. : Result: > CONTAINER_INTERNAL_ERROR > 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk > : Trace ID: : Message: ContainerID 402 creation failed : Result: > CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2363) Fail to create Ratis container
[ https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2363: - Description: Error logs; 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR - org.rocksdb.RocksDBException Failed init RocksDB, db path : /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db, exception :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db: does not exist (create_if_missing is false) CACHED_OPTS is a RockDB options cache in MetadataStoreBuilder. The cache keeps the old rocksdb options which is not refreshed with new option values at new call. Logs as following didn't reveal the true failure of write failure. Will improve following logs too. 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: CreateContainer : Trace ID: : Message: Container creation failed. : Result: CONTAINER_INTERNAL_ERROR 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk : Trace ID: : Message: ContainerID 402 creation failed : Result: CONTAINER_INTERNAL_ERROR was: Error logs; 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR - org.rocksdb.RocksDBException Failed init RocksDB, db path : /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db, exception :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db: does not exist (create_if_missing is false) Logs as following didn't reveal the true failure of write failure. Will improve following logs too. 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: CreateContainer : Trace ID: : Message: Container creation failed. : Result: CONTAINER_INTERNAL_ERROR 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk : Trace ID: : Message: ContainerID 402 creation failed : Result: CONTAINER_INTERNAL_ERROR > Fail to create Ratis container > -- > > Key: HDDS-2363 > URL: https://issues.apache.org/jira/browse/HDDS-2363 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Error logs; > 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR - > org.rocksdb.RocksDBException Failed init RocksDB, db path : > /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db, > exception > :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db: > does not exist (create_if_missing is false) > CACHED_OPTS is a RockDB options cache in MetadataStoreBuilder. The cache > keeps the old rocksdb options which is not refreshed with new option values > at new call. > Logs as following didn't reveal the true failure of write failure. Will > improve following logs too. > 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: > CreateContainer : Trace ID: : Message: Container creation failed. : Result: > CONTAINER_INTERNAL_ERROR > 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk > : Trace ID: : Message: ContainerID 402 creation failed : Result: > CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2363) Fail to create Ratis container
[ https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2363: - Summary: Fail to create Ratis container (was: Fail to create Ratis pipeline ) > Fail to create Ratis container > -- > > Key: HDDS-2363 > URL: https://issues.apache.org/jira/browse/HDDS-2363 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Critical > > Error logs; > 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR - > org.rocksdb.RocksDBException Failed init RocksDB, db path : > /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db, > exception > :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db: > does not exist (create_if_missing is false) > Logs as following didn't reveal the true failure of write failure. Will > improve following logs too. > 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: > CreateContainer : Trace ID: : Message: Container creation failed. : Result: > CONTAINER_INTERNAL_ERROR > 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk > : Trace ID: : Message: ContainerID 402 creation failed : Result: > CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2363) Fail to create Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2363: - Description: Error logs; 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR - org.rocksdb.RocksDBException Failed init RocksDB, db path : /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db, exception :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db: does not exist (create_if_missing is false) Logs as following didn't reveal the true failure of write failure. Will improve following logs too. 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: CreateContainer : Trace ID: : Message: Container creation failed. : Result: CONTAINER_INTERNAL_ERROR 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk : Trace ID: : Message: ContainerID 402 creation failed : Result: CONTAINER_INTERNAL_ERROR was: Logs as following didn't reveal the true failure of write failure. 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: CreateContainer : Trace ID: : Message: Container creation failed. : Result: CONTAINER_INTERNAL_ERROR 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk : Trace ID: : Message: ContainerID 402 creation failed : Result: CONTAINER_INTERNAL_ERROR > Fail to create Ratis pipeline > -- > > Key: HDDS-2363 > URL: https://issues.apache.org/jira/browse/HDDS-2363 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > > Error logs; > 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR - > org.rocksdb.RocksDBException Failed init RocksDB, db path : > /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db, > exception > :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db: > does not exist (create_if_missing is false) > Logs as following didn't reveal the true failure of write failure. Will > improve following logs too. > 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: > CreateContainer : Trace ID: : Message: Container creation failed. : Result: > CONTAINER_INTERNAL_ERROR > 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk > : Trace ID: : Message: ContainerID 402 creation failed : Result: > CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2363) Fail to create Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2363: - Priority: Critical (was: Major) > Fail to create Ratis pipeline > -- > > Key: HDDS-2363 > URL: https://issues.apache.org/jira/browse/HDDS-2363 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Critical > > Error logs; > 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR - > org.rocksdb.RocksDBException Failed init RocksDB, db path : > /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db, > exception > :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db: > does not exist (create_if_missing is false) > Logs as following didn't reveal the true failure of write failure. Will > improve following logs too. > 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: > CreateContainer : Trace ID: : Message: Container creation failed. : Result: > CONTAINER_INTERNAL_ERROR > 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk > : Trace ID: : Message: ContainerID 402 creation failed : Result: > CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2363) Fail to create Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2363: - Summary: Fail to create Ratis pipeline (was: Improve datanode write failure log) > Fail to create Ratis pipeline > -- > > Key: HDDS-2363 > URL: https://issues.apache.org/jira/browse/HDDS-2363 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > > Logs as following didn't reveal the true failure of write failure. > 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: > CreateContainer : Trace ID: : Message: Container creation failed. : Result: > CONTAINER_INTERNAL_ERROR > 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk > : Trace ID: : Message: ContainerID 402 creation failed : Result: > CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2363) Fail to create Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2363: - Issue Type: Bug (was: Improvement) > Fail to create Ratis pipeline > -- > > Key: HDDS-2363 > URL: https://issues.apache.org/jira/browse/HDDS-2363 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > > Logs as following didn't reveal the true failure of write failure. > 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: > CreateContainer : Trace ID: : Message: Container creation failed. : Result: > CONTAINER_INTERNAL_ERROR > 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk > : Trace ID: : Message: ContainerID 402 creation failed : Result: > CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2363) Improve datanode write failure log
[ https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2363: - Description: Logs as following didn't reveal the true failure of write failure. 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: CreateContainer : Trace ID: : Message: Container creation failed. : Result: CONTAINER_INTERNAL_ERROR 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk : Trace ID: : Message: ContainerID 402 creation failed : Result: CONTAINER_INTERNAL_ERROR was: Logs as following haven't reveal the true failure of write failure. 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: CreateContainer : Trace ID: : Message: Container creation failed. : Result: CONTAINER_INTERNAL_ERROR 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk : Trace ID: : Message: ContainerID 402 creation failed : Result: CONTAINER_INTERNAL_ERROR > Improve datanode write failure log > -- > > Key: HDDS-2363 > URL: https://issues.apache.org/jira/browse/HDDS-2363 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > > Logs as following didn't reveal the true failure of write failure. > 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: > CreateContainer : Trace ID: : Message: Container creation failed. : Result: > CONTAINER_INTERNAL_ERROR > 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk > : Trace ID: : Message: ContainerID 402 creation failed : Result: > CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2363) Improve datanode write failure log
Sammi Chen created HDDS-2363: Summary: Improve datanode write failure log Key: HDDS-2363 URL: https://issues.apache.org/jira/browse/HDDS-2363 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Datanode Reporter: Sammi Chen Assignee: Sammi Chen Logs as following haven't reveal the true failure of write failure. 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: CreateContainer : Trace ID: : Message: Container creation failed. : Result: CONTAINER_INTERNAL_ERROR 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk : Trace ID: : Message: ContainerID 402 creation failed : Result: CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2307) ContextFactory.java contains Windows '^M" at end of each line
[ https://issues.apache.org/jira/browse/HDDS-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen resolved HDDS-2307. -- Resolution: Not A Problem > ContextFactory.java contains Windows '^M" at end of each line > - > > Key: HDDS-2307 > URL: https://issues.apache.org/jira/browse/HDDS-2307 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Sammi Chen >Assignee: YiSheng Lien >Priority: Major > Labels: newbie > > Covert the file to Unix format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2307) ContextFactory.java contains Windows '^M" at end of each line
[ https://issues.apache.org/jira/browse/HDDS-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958494#comment-16958494 ] Sammi Chen commented on HDDS-2307: -- Hi [~cxorm], thanks for the investigation. It's accutaully a Hadoop file, not a Ozone file. I will close this JIRA and track it on Hadoop side. > ContextFactory.java contains Windows '^M" at end of each line > - > > Key: HDDS-2307 > URL: https://issues.apache.org/jira/browse/HDDS-2307 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Sammi Chen >Assignee: YiSheng Lien >Priority: Major > Labels: newbie > > Covert the file to Unix format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2307) ContextFactory.java contains Windows '^M" at end of each line
Sammi Chen created HDDS-2307: Summary: ContextFactory.java contains Windows '^M" at end of each line Key: HDDS-2307 URL: https://issues.apache.org/jira/browse/HDDS-2307 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Sammi Chen Covert the file to Unix format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2177) Add a srubber thread to detect creation failure pipelines in ALLOCATED state
Sammi Chen created HDDS-2177: Summary: Add a srubber thread to detect creation failure pipelines in ALLOCATED state Key: HDDS-2177 URL: https://issues.apache.org/jira/browse/HDDS-2177 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Sammi Chen -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2176) Add new pipeline state “CLOSING” and new CLOSE_PIPELINE_STATUS command
Sammi Chen created HDDS-2176: Summary: Add new pipeline state “CLOSING” and new CLOSE_PIPELINE_STATUS command Key: HDDS-2176 URL: https://issues.apache.org/jira/browse/HDDS-2176 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Sammi Chen Assignee: Sammi Chen Currrent pipeline has 3 states, ALLOCATED, OPEN and CLOSED. When create pipeline command is sent out to datanodes from SCM, this pipeline is marked as ALLOCATED in SCM. Once SCM received all 3 datanodes confirmation of pipelinee creation, SCM will change pipeline's state from ALLOCATED to OPEN. Close pipeline process is similar. Add a new CLOSING state to pipeline. When close pipeline command is sent out to datanodes, pipeline will be marked as CLOSING. When all 3 datanode confirmed, change pipeline state from CLOSING to CLOSED. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1933) Datanode should use hostname in place of ip addresses to allow DN's to work when ipaddress change
[ https://issues.apache.org/jira/browse/HDDS-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932361#comment-16932361 ] Sammi Chen commented on HDDS-1933: -- Hi [~msingh] , offer user the option to choose IpAddress or Hostname as the datanode identity is a tradition in HDFS. We borrow the idea from HDFS so that Ozone can be easily adopted in the network environment where previous HDFS is deployed. In many DCs, static IP is used for datanode. It's save to use IpAddres as the datanode identity in the case. I would propose to keep this option for user. If the cluster is using hostname because Ipaddress may change after restart, like Kubernetes cluster, user can simply set "dfs.datanode.use.datanode.hostname" to true (by default is false). > Datanode should use hostname in place of ip addresses to allow DN's to work > when ipaddress change > - > > Key: HDDS-1933 > URL: https://issues.apache.org/jira/browse/HDDS-1933 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Priority: Blocker > > This was noticed by [~elek] while deploying Ozone on Kubernetes based > environment. > When the datanode ip address change on restart, the Datanode details cease to > be correct for the datanode. and this prevents the cluster from functioning > after a restart. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2034) Async RATIS pipeline creation and destroy through heartbeat commands
[ https://issues.apache.org/jira/browse/HDDS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2034: - Summary: Async RATIS pipeline creation and destroy through heartbeat commands (was: Async pipeline creation and destroy through heartbeat commands) > Async RATIS pipeline creation and destroy through heartbeat commands > > > Key: HDDS-2034 > URL: https://issues.apache.org/jira/browse/HDDS-2034 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > > Currently, pipeline creation and destroy are synchronous operations. SCM > directly connect to each datanode of the pipeline through gRPC channel to > create the pipeline to destroy the pipeline. > This task is to remove the gRPC channel, send pipeline creation and destroy > action through heartbeat command to each datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2034) Async pipeline creation and destroy through heartbeat commands
[ https://issues.apache.org/jira/browse/HDDS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2034: - Description: Currently, pipeline creation and destroy are synchronous operations. SCM directly connect to each datanode of the pipeline through gRPC channel to create the pipeline to destroy the pipeline. This task is to remove the gRPC channel, send pipeline creation and destroy action through heartbeat command to each datanode. > Async pipeline creation and destroy through heartbeat commands > -- > > Key: HDDS-2034 > URL: https://issues.apache.org/jira/browse/HDDS-2034 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > > Currently, pipeline creation and destroy are synchronous operations. SCM > directly connect to each datanode of the pipeline through gRPC channel to > create the pipeline to destroy the pipeline. > This task is to remove the gRPC channel, send pipeline creation and destroy > action through heartbeat command to each datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2033) Support join multiple pipelines on datanode
[ https://issues.apache.org/jira/browse/HDDS-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen reassigned HDDS-2033: Assignee: Sammi Chen > Support join multiple pipelines on datanode > --- > > Key: HDDS-2033 > URL: https://issues.apache.org/jira/browse/HDDS-2033 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2034) Async pipeline creation and destroy through heartbeat commands
[ https://issues.apache.org/jira/browse/HDDS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2034: - Summary: Async pipeline creation and destroy through heartbeat commands (was: Add create pipeline command dispatcher and handle) > Async pipeline creation and destroy through heartbeat commands > -- > > Key: HDDS-2034 > URL: https://issues.apache.org/jira/browse/HDDS-2034 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2118) Datanode fail to start after stop
[ https://issues.apache.org/jira/browse/HDDS-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2118: - Description: Steps: 1. Run Teragen and generated a few GB data in a 4 datanodes cluster. 2. Stoped the datanodes through ./stop-ozone.sh. 3. Changed the ozone binaries 4. Start the cluster through ./start-ozone.sh. 5. Two datanode regisisterd to SCM. Two datanode fail to appear at SCM side. Checked these two failed node, datanode process is still running. In the logfile, I found a lot of following errors. 2019-09-12 21:06:45,255 [Datanode State Machine Thread - 0] INFO - Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO - Attempting to start container services. 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO - Background container scanner has been disabled. 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO - Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] ERROR - Unable to communicate to SCM server at 10.120.110.183:9861 for past 2100 seconds. org.apache.ratis.protocol.ChecksumException: LogEntry is corrupt. Calculated checksum is -134141393 but read checksum 0 at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:299) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:185) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:121) at org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:94) at org.apache.ratis.server.raftlog.segmented.LogSegment.loadSegment(LogSegment.java:117) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:310) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:234) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:204) at org.apache.ratis.server.raftlog.RaftLog.open(RaftLog.java:247) at org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:190) at org.apache.ratis.server.impl.ServerState.(ServerState.java:120) at org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110) at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) was: 2019-09-12 21:06:45,255 [Datanode State Machine Thread - 0] INFO - Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO - Attempting to start container services. 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO - Background container scanner has been disabled. 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO - Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] ERROR - Unable to communicate to SCM server at 10.120.110.183:9861 for past 2100 seconds. org.apache.ratis.protocol.ChecksumException: LogEntry is corrupt. Calculated checksum is -134141393 but read checksum 0 at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:299) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:185) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:121) at org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:94) at org.apache.ratis.server.raftlog.segmented.LogSegment.loadSegment(LogSegment.java:117) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:310) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:234) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:204) at org.apache.ratis.server.raftlog.RaftLog.open(RaftLog.java:247) at org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:190)
[jira] [Updated] (HDDS-2118) Datanode fail to function after stop
[ https://issues.apache.org/jira/browse/HDDS-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2118: - Summary: Datanode fail to function after stop (was: Datanode fail to start after stop) > Datanode fail to function after stop > > > Key: HDDS-2118 > URL: https://issues.apache.org/jira/browse/HDDS-2118 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Sammi Chen >Priority: Major > > Steps: > 1. Run Teragen and generated a few GB data in a 4 datanodes cluster. > 2. Stoped the datanodes through ./stop-ozone.sh. > 3. Changed the ozone binaries > 4. Start the cluster through ./start-ozone.sh. > 5. Two datanode regisisterd to SCM. Two datanode fail to appear at SCM side. > > Checked these two failed node, datanode process is still running. In the > logfile, I found a lot of following errors. > 2019-09-12 21:06:45,255 [Datanode State Machine Thread - 0] INFO - > Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858 > 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO - > Attempting to start container services. > 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO - > Background container scanner has been disabled. > 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO - > Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858 > 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] ERROR - > Unable to communicate to SCM server at 10.120.110.183:9861 for past 2100 > seconds. > org.apache.ratis.protocol.ChecksumException: LogEntry is corrupt. Calculated > checksum is -134141393 but read checksum 0 > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:299) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:185) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:121) > at > org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:94) > at > org.apache.ratis.server.raftlog.segmented.LogSegment.loadSegment(LogSegment.java:117) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:310) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:234) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:204) > at org.apache.ratis.server.raftlog.RaftLog.open(RaftLog.java:247) > at > org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:190) > at > org.apache.ratis.server.impl.ServerState.(ServerState.java:120) > at > org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110) > at > org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2118) Datanode fail to start after stop
Sammi Chen created HDDS-2118: Summary: Datanode fail to start after stop Key: HDDS-2118 URL: https://issues.apache.org/jira/browse/HDDS-2118 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Sammi Chen 2019-09-12 21:06:45,255 [Datanode State Machine Thread - 0] INFO - Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO - Attempting to start container services. 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO - Background container scanner has been disabled. 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO - Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] ERROR - Unable to communicate to SCM server at 10.120.110.183:9861 for past 2100 seconds. org.apache.ratis.protocol.ChecksumException: LogEntry is corrupt. Calculated checksum is -134141393 but read checksum 0 at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:299) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:185) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:121) at org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:94) at org.apache.ratis.server.raftlog.segmented.LogSegment.loadSegment(LogSegment.java:117) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:310) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:234) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:204) at org.apache.ratis.server.raftlog.RaftLog.open(RaftLog.java:247) at org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:190) at org.apache.ratis.server.impl.ServerState.(ServerState.java:120) at org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110) at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone
[ https://issues.apache.org/jira/browse/HDDS-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928315#comment-16928315 ] Sammi Chen commented on HDDS-2106: -- Hi [~elek] , I meet build issue after trunk rebase. Following is the console log. I use the build command "mvn clean install -T 6 -Pdist -Phdds -DskipTests -Dmaven.javadoc.skip=true -am -pl :hadoop-ozone-dist" I see maven-javadoc-plugin.version is defined as "3.0.1". My local maven is 3.6.0. Don't know why build fails. [INFO] Scanning for projects... [ERROR] [ERROR] Some problems were encountered while processing the POMs: [ERROR] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-javadoc-plugin must be a valid version but is '${maven-javadoc-plugin.version}'. @ org.apache.hadoop:hadoop-main-ozone:0.5.0-SNAPSHOT, /Users/sammi/workspace/hadoop/pom.ozone.xml, line 1604, column 20 [ERROR] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-javadoc-plugin must be a valid version but is '${maven-javadoc-plugin.version}'. @ org.apache.hadoop:hadoop-main-ozone:0.5.0-SNAPSHOT, /Users/sammi/workspace/hadoop/pom.ozone.xml, line 1604, column 20 > Avoid usage of hadoop projects as parent of hdds/ozone > -- > > Key: HDDS-2106 > URL: https://issues.apache.org/jira/browse/HDDS-2106 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Blocker > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Ozone uses hadoop as a dependency. The dependency defined on multiple level: > 1. the hadoop artifacts are defined in the sections > 2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the > parent > As we already have a slightly different assembly process it could be more > resilient to use a dedicated parent project instead of the hadoop one. With > this approach it will be easier to upgrade the versions as we don't need to > be careful about the pom contents only about the used dependencies. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2034) Add create pipeline command dispatcher and handle
[ https://issues.apache.org/jira/browse/HDDS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen reassigned HDDS-2034: Assignee: Sammi Chen > Add create pipeline command dispatcher and handle > - > > Key: HDDS-2034 > URL: https://issues.apache.org/jira/browse/HDDS-2034 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2069) Default values of property hdds.datanode.storage.utilization.critical.threshold and hdds.datanode.storage.utilization.warning.threshold are not reasonable
[ https://issues.apache.org/jira/browse/HDDS-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2069: - Summary: Default values of property hdds.datanode.storage.utilization.critical.threshold and hdds.datanode.storage.utilization.warning.threshold are not reasonable (was: Value of property hdds.datanode.storage.utilization.critical.threshold and hdds.datanode.storage.utilization.warning.threshold is not reasonable) > Default values of property > hdds.datanode.storage.utilization.critical.threshold and > hdds.datanode.storage.utilization.warning.threshold are not reasonable > -- > > Key: HDDS-2069 > URL: https://issues.apache.org/jira/browse/HDDS-2069 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > > Currently, hdds.datanode.storage.utilization.warning.threshold is 0.95 and > hdds.datanode.storage.utilization.critical.threshold is 0.75. > The values should be exchanged. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2069) Value of property hdds.datanode.storage.utilization.critical.threshold and hdds.datanode.storage.utilization.warning.threshold is not reasonable
Sammi Chen created HDDS-2069: Summary: Value of property hdds.datanode.storage.utilization.critical.threshold and hdds.datanode.storage.utilization.warning.threshold is not reasonable Key: HDDS-2069 URL: https://issues.apache.org/jira/browse/HDDS-2069 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Sammi Chen Assignee: Sammi Chen Currently, hdds.datanode.storage.utilization.warning.threshold is 0.95 and hdds.datanode.storage.utilization.critical.threshold is 0.75. The values should be exchanged. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1553) Add metrics in rack aware container placement policy
[ https://issues.apache.org/jira/browse/HDDS-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917357#comment-16917357 ] Sammi Chen commented on HDDS-1553: -- Hi [~ljain], I just attched the initial patch. Feel free to give any feedbacks. > Add metrics in rack aware container placement policy > > > Key: HDDS-1553 > URL: https://issues.apache.org/jira/browse/HDDS-1553 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > To collect following statistics, > 1. total requested datanode count (A) > 2. success allocated datanode count without constrain compromise (B) > 3. success allocated datanode count with some comstrain compromise (C) > B includes C, failed allocation = (A - B) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1571) Create an interface for pipeline placement policy to support network topologies
[ https://issues.apache.org/jira/browse/HDDS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen reassigned HDDS-1571: Assignee: Sammi Chen (was: Siddharth Wagle) > Create an interface for pipeline placement policy to support network > topologies > --- > > Key: HDDS-1571 > URL: https://issues.apache.org/jira/browse/HDDS-1571 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Siddharth Wagle >Assignee: Sammi Chen >Priority: Major > > Leverage the work done in HDDS-700 for pipeline creation for open containers. > Create an interface that can provide different policy implementations for > pipeline creation. The default implementation should take into account no > topology information is configured. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1569) Add ability to SCM for creating multiple pipelines with same datanode
[ https://issues.apache.org/jira/browse/HDDS-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen reassigned HDDS-1569: Assignee: Li Cheng (was: Siddharth Wagle) > Add ability to SCM for creating multiple pipelines with same datanode > - > > Key: HDDS-1569 > URL: https://issues.apache.org/jira/browse/HDDS-1569 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Siddharth Wagle >Assignee: Li Cheng >Priority: Major > > - Refactor _RatisPipelineProvider.create()_ to be able to create pipelines > with datanodes that are not a part of sufficient pipelines > - Define soft and hard upper bounds for pipeline membership > - Create SCMAllocationManager that can be leveraged to get a candidate set of > datanodes based on placement policies > - Add the datanodes to internal datastructures -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2036) Multi-raft support on single datanode integration test
Sammi Chen created HDDS-2036: Summary: Multi-raft support on single datanode integration test Key: HDDS-2036 URL: https://issues.apache.org/jira/browse/HDDS-2036 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Sammi Chen Improve MiniOzoneCluster to support multi-raft group -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2035) Improve CLI listPipeline
Sammi Chen created HDDS-2035: Summary: Improve CLI listPipeline Key: HDDS-2035 URL: https://issues.apache.org/jira/browse/HDDS-2035 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Sammi Chen 1. filter pipeline by datanode -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1574) ensure same datanodes are not a part of multiple pipelines
[ https://issues.apache.org/jira/browse/HDDS-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1574: - Summary: ensure same datanodes are not a part of multiple pipelines (was: Implement pipeline placement policy to ensure same datanodes are not a part of multiple pipelines) > ensure same datanodes are not a part of multiple pipelines > -- > > Key: HDDS-1574 > URL: https://issues.apache.org/jira/browse/HDDS-1574 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Siddharth Wagle >Assignee: Siddharth Wagle >Priority: Major > > Details in design doc. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2034) Add create pipeline command dispatcher and handle
Sammi Chen created HDDS-2034: Summary: Add create pipeline command dispatcher and handle Key: HDDS-2034 URL: https://issues.apache.org/jira/browse/HDDS-2034 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Sammi Chen -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1574) Implement pipeline placement policy to ensure same datanodes are not a part of multiple pipelines
[ https://issues.apache.org/jira/browse/HDDS-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1574: - Summary: Implement pipeline placement policy to ensure same datanodes are not a part of multiple pipelines (was: Implement pipeline choose policy to ensure same datanodes are not a part of multiple pipelines) > Implement pipeline placement policy to ensure same datanodes are not a part > of multiple pipelines > - > > Key: HDDS-1574 > URL: https://issues.apache.org/jira/browse/HDDS-1574 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Siddharth Wagle >Assignee: Siddharth Wagle >Priority: Major > > Details in design doc. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1574) Implement pipeline choose policy to ensure same datanodes are not a part of multiple pipelines
[ https://issues.apache.org/jira/browse/HDDS-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1574: - Summary: Implement pipeline choose policy to ensure same datanodes are not a part of multiple pipelines (was: Ensure that same datanodes are not a part of multiple pipelines) > Implement pipeline choose policy to ensure same datanodes are not a part of > multiple pipelines > -- > > Key: HDDS-1574 > URL: https://issues.apache.org/jira/browse/HDDS-1574 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Siddharth Wagle >Assignee: Siddharth Wagle >Priority: Major > > Details in design doc. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2033) Support join multiple pipelines on datanode
Sammi Chen created HDDS-2033: Summary: Support join multiple pipelines on datanode Key: HDDS-2033 URL: https://issues.apache.org/jira/browse/HDDS-2033 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Sammi Chen -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1930) Test Topology Aware Job scheduling with Ozone Topology
[ https://issues.apache.org/jira/browse/HDDS-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1930: - Description: My initial results with Terasort does not seem to report the counter properly. Most of the requests are handled by rack local but no node local. This ticket is opened to add more system testing to validate the feature. Total Allocated Containers: 3778 Each table cell represents the number of NodeLocal/RackLocal/OffSwitch containers satisfied by NodeLocal/RackLocal/OffSwitch resource requests. Node Local Request Rack Local Request Off Switch Request Num Node Local Containers (satisfied by)0 Num Rack Local Containers (satisfied by)0 3648 Num Off Switch Containers (satisfied by)0 96 34 was: My initial results with Terasort does not seem to report the counter properly. Most of the requests are handled by rack locl but no node local. This ticket is opened to add more system testing to validate the feature. Total Allocated Containers: 3778 Each table cell represents the number of NodeLocal/RackLocal/OffSwitch containers satisfied by NodeLocal/RackLocal/OffSwitch resource requests. Node Local Request Rack Local Request Off Switch Request Num Node Local Containers (satisfied by)0 Num Rack Local Containers (satisfied by)0 3648 Num Off Switch Containers (satisfied by)0 96 34 > Test Topology Aware Job scheduling with Ozone Topology > -- > > Key: HDDS-1930 > URL: https://issues.apache.org/jira/browse/HDDS-1930 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Priority: Major > > My initial results with Terasort does not seem to report the counter > properly. Most of the requests are handled by rack local but no node local. > This ticket is opened to add more system testing to validate the feature. > Total Allocated Containers: 3778 > Each table cell represents the number of NodeLocal/RackLocal/OffSwitch > containers satisfied by NodeLocal/RackLocal/OffSwitch resource requests. > Node Local RequestRack Local Request Off Switch Request > Num Node Local Containers (satisfied by) 0 > Num Rack Local Containers (satisfied by) 0 3648 > Num Off Switch Containers (satisfied by) 0 96 34 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2031) Choose datanode for pipeline creation based on network topology
Sammi Chen created HDDS-2031: Summary: Choose datanode for pipeline creation based on network topology Key: HDDS-2031 URL: https://issues.apache.org/jira/browse/HDDS-2031 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Sammi Chen Assignee: Sammi Chen There are regular heartbeats between datanodes in a pipeline. Choose datanodes based on network topology, to guarantee data reliability and reduce heartbeat network traffic latency. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1963) OM DB Schema defintion in OmMetadataManagerImpl and OzoneConsts are not consistent
Sammi Chen created HDDS-1963: Summary: OM DB Schema defintion in OmMetadataManagerImpl and OzoneConsts are not consistent Key: HDDS-1963 URL: https://issues.apache.org/jira/browse/HDDS-1963 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Sammi Chen OzoneConsts.java * OM DB Schema: * -- * | KEY | VALUE | * -- * | $userName| VolumeList | * -- * | /#volumeName | VolumeInfo | * -- * | /#volumeName/#bucketName | BucketInfo | * -- * | /volumeName/bucketName/keyName | KeyInfo| * -- * | #deleting#/volumeName/bucketName/keyName | KeyInfo| * -- OmMetadataManagerImpl.java /** * OM RocksDB Structure . * * OM DB stores metadata as KV pairs in different column families. * * OM DB Schema: * |---| * | Column Family |VALUE | * |---| * | userTable | user->VolumeList | * |---| * | volumeTable| /volume->VolumeInfo | * |---| * | bucketTable| /volume/bucket-> BucketInfo | * |---| * | keyTable | /volumeName/bucketName/keyName->KeyInfo | * |---| * | deletedTable | /volumeName/bucketName/keyName->KeyInfo | * |---| * | openKey| /volumeName/bucketName/keyName/id->KeyInfo | * |---| * | s3Table| s3BucketName -> /volumeName/bucketName | * |---| * | s3SecretTable | s3g_access_key_id -> s3Secret| * |---| * | dTokenTable| s3g_access_key_id -> s3Secret| * |---| * | prefixInfoTable | prefix -> PrefixInfo | * |---| */ It's better to put OM DB Schema defintion in one place to resolve this inconsistency due to information redundancy. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1897) SCMNodeManager.java#getNodeByAddress cannot find nodes by addresses
[ https://issues.apache.org/jira/browse/HDDS-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen reassigned HDDS-1897: Assignee: Li Cheng > SCMNodeManager.java#getNodeByAddress cannot find nodes by addresses > --- > > Key: HDDS-1897 > URL: https://issues.apache.org/jira/browse/HDDS-1897 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Mukul Kumar Singh >Assignee: Li Cheng >Priority: Major > Labels: MiniOzoneChaosCluster > > SCMNodeManager cannot find the nodes via ip addresses in MiniOzoneChaosCluster > {code} > 2019-08-02 13:57:01,501 WARN node.SCMNodeManager > (SCMNodeManager.java:getNodeByAddress(599)) - Cannot find node for address > 127.0.0.1 > {code} > cc: [~xyao] & [~Sammi] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1958) Container time information returned by "scmcli list" is not human friendly
Sammi Chen created HDDS-1958: Summary: Container time information returned by "scmcli list" is not human friendly Key: HDDS-1958 URL: https://issues.apache.org/jira/browse/HDDS-1958 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Sammi Chen ozone scmcli list -s=0 { "state" : "OPEN", "replicationFactor" : "ONE", "replicationType" : "STAND_ALONE", "usedBytes" : 0, "numberOfKeys" : 0, "lastUsed" : 13353985, "stateEnterTime" : 13316615, "owner" : "OZONE", "containerID" : 1, "deleteTransactionId" : 0, "sequenceId" : 0, "open" : true } LastUsed and stateEnterTime are not human friendly. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1953) Remove pipeline persistent in SCM
Sammi Chen created HDDS-1953: Summary: Remove pipeline persistent in SCM Key: HDDS-1953 URL: https://issues.apache.org/jira/browse/HDDS-1953 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Sammi Chen Assignee: Sammi Chen Currently, SCM will persistent pipeline in metastore with datanode information locally. After SCM restart, it will reload all the pipelines from the metastore. If there is any datanode information change during the whole SCMc lifecycle, the persisted pipeline is not updated. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1882) TestReplicationManager failed with NPE in ReplicationManager.java
[ https://issues.apache.org/jira/browse/HDDS-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1882: - Summary: TestReplicationManager failed with NPE in ReplicationManager.java (was: TestReplicationManager failed with NPE) > TestReplicationManager failed with NPE in ReplicationManager.java > -- > > Key: HDDS-1882 > URL: https://issues.apache.org/jira/browse/HDDS-1882 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1882) TestReplicationManager failed with NPE
Sammi Chen created HDDS-1882: Summary: TestReplicationManager failed with NPE Key: HDDS-1882 URL: https://issues.apache.org/jira/browse/HDDS-1882 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Sammi Chen Assignee: Sammi Chen -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1879) Support multiple excluded scopes when choosing datanodes in NetworkTopology
Sammi Chen created HDDS-1879: Summary: Support multiple excluded scopes when choosing datanodes in NetworkTopology Key: HDDS-1879 URL: https://issues.apache.org/jira/browse/HDDS-1879 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Sammi Chen Assignee: Sammi Chen -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1865) Use "ozone.network.topology.aware.read" to control both RPC client and server side logic
[ https://issues.apache.org/jira/browse/HDDS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1865: - Summary: Use "ozone.network.topology.aware.read" to control both RPC client and server side logic (was: Use "dfs.network.topology.aware.read.enable" to control both RPC client and server side logic ) > Use "ozone.network.topology.aware.read" to control both RPC client and server > side logic > - > > Key: HDDS-1865 > URL: https://issues.apache.org/jira/browse/HDDS-1865 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1865) Use "dfs.network.topology.aware.read.enable" to control both clien…
[ https://issues.apache.org/jira/browse/HDDS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1865: - Summary: Use "dfs.network.topology.aware.read.enable" to control both clien… (was: Use "ozone.distance.aware.read.enable" to control both client and OM side distance aware read logic) > Use "dfs.network.topology.aware.read.enable" to control both clien… > --- > > Key: HDDS-1865 > URL: https://issues.apache.org/jira/browse/HDDS-1865 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1865) Use "dfs.network.topology.aware.read.enable" to control both RPC client and server side logic
[ https://issues.apache.org/jira/browse/HDDS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1865: - Summary: Use "dfs.network.topology.aware.read.enable" to control both RPC client and server side logic (was: Use "dfs.network.topology.aware.read.enable" to control both clien…) > Use "dfs.network.topology.aware.read.enable" to control both RPC client and > server side logic > -- > > Key: HDDS-1865 > URL: https://issues.apache.org/jira/browse/HDDS-1865 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1865) Use "ozone.distance.aware.read.enable" to control both client and OM side distance aware read logic
[ https://issues.apache.org/jira/browse/HDDS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1865: - Summary: Use "ozone.distance.aware.read.enable" to control both client and OM side distance aware read logic (was: Use "ozone.distance.aware.read.enable" to control both client side and OM side topology aware read logic) > Use "ozone.distance.aware.read.enable" to control both client and OM side > distance aware read logic > --- > > Key: HDDS-1865 > URL: https://issues.apache.org/jira/browse/HDDS-1865 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1865) Use "ozone.distance.aware.read.enable" to control both client side and OM side topology aware read logic
[ https://issues.apache.org/jira/browse/HDDS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1865: - Summary: Use "ozone.distance.aware.read.enable" to control both client side and OM side topology aware read logic (was: Use "dfs.network.topology.aware.read.enable" to control both client side and OM side topology aware read logic) > Use "ozone.distance.aware.read.enable" to control both client side and OM > side topology aware read logic > > > Key: HDDS-1865 > URL: https://issues.apache.org/jira/browse/HDDS-1865 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1865) Use "dfs.network.topology.aware.read.enable" to control both client side and OM side topology aware read logic
Sammi Chen created HDDS-1865: Summary: Use "dfs.network.topology.aware.read.enable" to control both client side and OM side topology aware read logic Key: HDDS-1865 URL: https://issues.apache.org/jira/browse/HDDS-1865 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Sammi Chen Assignee: Sammi Chen -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1864) Turn on topology aware read in TestFailureHandlingByClient
Sammi Chen created HDDS-1864: Summary: Turn on topology aware read in TestFailureHandlingByClient Key: HDDS-1864 URL: https://issues.apache.org/jira/browse/HDDS-1864 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Sammi Chen Assignee: Sammi Chen -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1707) SCMContainerPlacementRackAware#chooseDatanodes throws not enough datanodes when all nodes(40) are up
[ https://issues.apache.org/jira/browse/HDDS-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893282#comment-16893282 ] Sammi Chen commented on HDDS-1707: -- Thanks [~msingh] for report his. It been fixed by code change in HDDS-1713. > SCMContainerPlacementRackAware#chooseDatanodes throws not enough datanodes > when all nodes(40) are up > > > Key: HDDS-1707 > URL: https://issues.apache.org/jira/browse/HDDS-1707 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Mukul Kumar Singh >Priority: Major > > SCMContainerPlacementRackAware#chooseDatanodes is failing with the following > error repeatedly. > {code} > 2019-06-17 22:15:52,455 WARN > org.apache.hadoop.hdds.scm.container.ReplicationManager: Exception while > replicating container 407. > org.apache.hadoop.hdds.scm.exceptions.SCMException: No enough datanodes to > choose. > at > org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:100) > at > org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:487) > at > org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:293) > at > java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649) > at > java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080) > at > org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:205) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1707) SCMContainerPlacementRackAware#chooseDatanodes throws not enough datanodes when all nodes(40) are up
[ https://issues.apache.org/jira/browse/HDDS-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen resolved HDDS-1707. -- Resolution: Fixed Assignee: Sammi Chen > SCMContainerPlacementRackAware#chooseDatanodes throws not enough datanodes > when all nodes(40) are up > > > Key: HDDS-1707 > URL: https://issues.apache.org/jira/browse/HDDS-1707 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Mukul Kumar Singh >Assignee: Sammi Chen >Priority: Major > > SCMContainerPlacementRackAware#chooseDatanodes is failing with the following > error repeatedly. > {code} > 2019-06-17 22:15:52,455 WARN > org.apache.hadoop.hdds.scm.container.ReplicationManager: Exception while > replicating container 407. > org.apache.hadoop.hdds.scm.exceptions.SCMException: No enough datanodes to > choose. > at > org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:100) > at > org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:487) > at > org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:293) > at > java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649) > at > java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080) > at > org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:205) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1809) Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen resolved HDDS-1809. -- Resolution: Fixed > Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis > pipeline > - > > Key: HDDS-1809 > URL: https://issues.apache.org/jira/browse/HDDS-1809 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Sammi Chen >Priority: Major > Fix For: 0.5.0 > > > {code:java} > java.io.IOException: Unexpected OzoneException: java.io.IOException: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144) > at > org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239) > at > org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171) > at > org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47) > at java.io.InputStream.read(InputStream.java:101) > at > org.apache.hadoop.ozone.container.ContainerTestHelper.validateData(ContainerTestHelper.java:709) > at > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.validateData(TestFailureHandlingByClient.java:458) > at > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testBlockWritesWithDnFailures(TestFailureHandlingByClient.java:158) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at org.junit.runner.JUnitCore.run(JUnitCore.java:160) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1855) TestStorageContainerManager#testScmProcessDatanodeHeartbeat is failing
[ https://issues.apache.org/jira/browse/HDDS-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1855: - Resolution: Fixed Status: Resolved (was: Patch Available) > TestStorageContainerManager#testScmProcessDatanodeHeartbeat is failing > -- > > Key: HDDS-1855 > URL: https://issues.apache.org/jira/browse/HDDS-1855 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Nanda kumar >Assignee: Nanda kumar >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > {{TestStorageContainerManager#testScmProcessDatanodeHeartbeat}} is failing > with the following exception > {noformat} > [ERROR] Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 106.315 s <<< FAILURE! - in > org.apache.hadoop.ozone.TestStorageContainerManager > [ERROR] > testScmProcessDatanodeHeartbeat(org.apache.hadoop.ozone.TestStorageContainerManager) > Time elapsed: 21.97 s <<< FAILURE! > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.ozone.TestStorageContainerManager.testScmProcessDatanodeHeartbeat(TestStorageContainerManager.java:531) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1751) replication of underReplicated container fails with SCMContainerPlacementRackAware policy
[ https://issues.apache.org/jira/browse/HDDS-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen resolved HDDS-1751. -- Resolution: Fixed > replication of underReplicated container fails with > SCMContainerPlacementRackAware policy > - > > Key: HDDS-1751 > URL: https://issues.apache.org/jira/browse/HDDS-1751 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Sammi Chen >Priority: Major > Labels: MiniOzoneChaosCluster > > SCM container replication fails with > {code} > 2019-07-02 18:26:41,564 WARN container.ReplicationManager > (ReplicationManager.java:handleUnderReplicatedContainer(501)) - Exception > while replicating container 18. > org.apache.hadoop.hdds.scm.exceptions.SCMException: No enough datanodes to > choose. > at > org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:100) > at > org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:487) > at > org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:293) > at > java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649) > at > java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080) > at > org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:205) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1751) replication of underReplicated container fails with SCMContainerPlacementRackAware policy
[ https://issues.apache.org/jira/browse/HDDS-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890814#comment-16890814 ] Sammi Chen commented on HDDS-1751: -- Yes, it's fixed by HDDS-1713. I run "src/test/bin/start-chaos.sh" locally with SCMContainerPlacementRackAware as the placement policy. Here is the log, 2019-07-23 16:57:38,336 INFO container.ReplicationManager (ReplicationManager.java:handleUnderReplicatedContainer(489)) - Container #3 is under replicated. Expected replica count is 3, but found 2. 2019-07-23 16:57:38,336 INFO container.ReplicationManager (ReplicationManager.java:sendReplicateCommand(652)) - Sending replicate container command for container #3 to datanode e4635174-5f4b-4141-aea3-d994486370aa{ip: 127.0.0.1, host: vm-centos, networkLocation: /default-rack, certSerialId: null} 2019-07-23 16:57:38,336 INFO container.ReplicationManager (ReplicationManager.java:handleUnderReplicatedContainer(489)) - Container #9 is under replicated. Expected replica count is 3, but found 2. 2019-07-23 16:57:38,336 INFO container.ReplicationManager (ReplicationManager.java:sendReplicateCommand(652)) - Sending replicate container command for container #9 to datanode 718da402-1433-4b44-8479-0a42e47929fd{ip: 127.0.0.1, host: vm-centos, networkLocation: /default-rack, certSerialId: null} 2019-07-23 16:57:38,336 INFO container.ReplicationManager (ReplicationManager.java:handleUnderReplicatedContainer(489)) - Container #10 is under replicated. Expected replica count is 3, but found 2. 2019-07-23 16:57:38,336 INFO container.ReplicationManager (ReplicationManager.java:sendReplicateCommand(652)) - Sending replicate container command for container #10 to datanode 718da402-1433-4b44-8479-0a42e47929fd{ip: 127.0.0.1, host: vm-centos, networkLocation: /default-rack, certSerialId: null} > replication of underReplicated container fails with > SCMContainerPlacementRackAware policy > - > > Key: HDDS-1751 > URL: https://issues.apache.org/jira/browse/HDDS-1751 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Sammi Chen >Priority: Major > Labels: MiniOzoneChaosCluster > > SCM container replication fails with > {code} > 2019-07-02 18:26:41,564 WARN container.ReplicationManager > (ReplicationManager.java:handleUnderReplicatedContainer(501)) - Exception > while replicating container 18. > org.apache.hadoop.hdds.scm.exceptions.SCMException: No enough datanodes to > choose. > at > org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:100) > at > org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:487) > at > org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:293) > at > java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649) > at > java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080) > at > org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:205) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1751) replication of underReplicated container fails with SCMContainerPlacementRackAware policy
[ https://issues.apache.org/jira/browse/HDDS-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen reassigned HDDS-1751: Assignee: Sammi Chen > replication of underReplicated container fails with > SCMContainerPlacementRackAware policy > - > > Key: HDDS-1751 > URL: https://issues.apache.org/jira/browse/HDDS-1751 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Sammi Chen >Priority: Major > Labels: MiniOzoneChaosCluster > > SCM container replication fails with > {code} > 2019-07-02 18:26:41,564 WARN container.ReplicationManager > (ReplicationManager.java:handleUnderReplicatedContainer(501)) - Exception > while replicating container 18. > org.apache.hadoop.hdds.scm.exceptions.SCMException: No enough datanodes to > choose. > at > org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:100) > at > org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:487) > at > org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:293) > at > java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649) > at > java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080) > at > org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:205) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1809) Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890729#comment-16890729 ] Sammi Chen commented on HDDS-1809: -- Thanks [~shashikant] for report this issue. It has been fixed by the code change in HDDS-1713. The root cause is previously network topology use Ipaddress as the node key in topology cluster, which results that three sorted Datanodes are the same node. Now datanode uuid is used as the node key in topology cluster, so sorted Datanodes will be three different nodes now. > Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis > pipeline > - > > Key: HDDS-1809 > URL: https://issues.apache.org/jira/browse/HDDS-1809 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Sammi Chen >Priority: Major > Fix For: 0.5.0 > > > {code:java} > java.io.IOException: Unexpected OzoneException: java.io.IOException: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144) > at > org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239) > at > org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171) > at > org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47) > at java.io.InputStream.read(InputStream.java:101) > at > org.apache.hadoop.ozone.container.ContainerTestHelper.validateData(ContainerTestHelper.java:709) > at > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.validateData(TestFailureHandlingByClient.java:458) > at > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testBlockWritesWithDnFailures(TestFailureHandlingByClient.java:158) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at org.junit.runner.JUnitCore.run(JUnitCore.java:160) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1653) Add option to "ozone scmcli printTopology" to order the output acccording to topology layer
[ https://issues.apache.org/jira/browse/HDDS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1653: - Resolution: Fixed Fix Version/s: 0.5.0 Status: Resolved (was: Patch Available) > Add option to "ozone scmcli printTopology" to order the output acccording to > topology layer > --- > > Key: HDDS-1653 > URL: https://issues.apache.org/jira/browse/HDDS-1653 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Add option to order the output acccording to topology layer. > For example, for /rack/node topolgy, we can show, > State = HEALTHY > /default-rack: > ozone_datanode_1.ozone_default/172.18.0.3 > ozone_datanode_2.ozone_default/172.18.0.2 > ozone_datanode_3.ozone_default/172.18.0.4 > /rack1: > ozone_datanode_4.ozone_default/172.18.0.5 > ozone_datanode_5.ozone_default/172.18.0.6 > For /dc/rack/node topology, we can either show > State = HEALTHY > /default-dc/default-rack: > ozone_datanode_1.ozone_default/172.18.0.3 > ozone_datanode_2.ozone_default/172.18.0.2 > ozone_datanode_3.ozone_default/172.18.0.4 > /dc1/rack1: > ozone_datanode_4.ozone_default/172.18.0.5 > ozone_datanode_5.ozone_default/172.18.0.6 > or > State = HEALTHY > default-dc: > default-rack: > ozone_datanode_1.ozone_default/172.18.0.3 > ozone_datanode_2.ozone_default/172.18.0.2 > ozone_datanode_3.ozone_default/172.18.0.4 > dc1: > rack1: > ozone_datanode_4.ozone_default/172.18.0.5 > ozone_datanode_5.ozone_default/172.18.0.6 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1787) NPE thrown while trying to find DN closest to client
[ https://issues.apache.org/jira/browse/HDDS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885781#comment-16885781 ] Sammi Chen commented on HDDS-1787: -- Hi [~msingh], thanks for the instructions. I will try it locally. I also created a unit test which reproduced the issue. > NPE thrown while trying to find DN closest to client > > > Key: HDDS-1787 > URL: https://issues.apache.org/jira/browse/HDDS-1787 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Siddharth Wagle >Assignee: Sammi Chen >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > cc: [~xyao] This seems related to the client side topology changes, not sure > if some other Jira is already addressing this. > {code} > 2019-07-10 16:45:53,176 WARN ipc.Server (Server.java:logException(2724)) - > IPC Server handler 14 on 35066, call Call#127037 Retry#0 > org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol.send from 17 > 2.31.116.73:52540 > java.lang.NullPointerException > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.lambda$sortDatanodes$0(ScmBlockLocationProtocolServerSideTranslatorPB.java:215) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.sortDatanodes(ScmBlockLocationProtocolServerSideTranslatorPB.java:215) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:124) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > 2019-07-10 16:45:53,176 WARN om.KeyManagerImpl > (KeyManagerImpl.java:lambda$sortDatanodeInPipeline$7(2129)) - Unable to sort > datanodes based on distance to client, volume=xqoyzocpse, bucket=vxwajaczqh, > key=pool-444-thread-7-201077822, client=127.0.0.1, > datanodes=[10f15723-45d7-4a0c-8f01-8b101744a110{ip: 172.31.116.73, host: > sid-minichaos.gce.cloudera.com, networkLocation: /default-rack, certSerialId: > null}, 7ac2777f-0a5c-4414-9e7f-bfbc47d696ea{ip: 172.31.116.73, host: > sid-minichaos.gce.cloudera.com, networkLocation: /default-rack, certSerialId: > null}], exception=java.lang.NullPointerException > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.lambda$sortDatanodes$0(ScmBlockLocationProtocolServerSideTranslatorPB.java:215) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.sortDatanodes(ScmBlockLocationProtocolServerSideTranslatorPB.java:215) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:124) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {code} --
[jira] [Commented] (HDDS-1787) NPE thrown while trying to find DN closest to client
[ https://issues.apache.org/jira/browse/HDDS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885202#comment-16885202 ] Sammi Chen commented on HDDS-1787: -- Hi [~swagle], I uploaded a patch. Could you help me to review it? > NPE thrown while trying to find DN closest to client > > > Key: HDDS-1787 > URL: https://issues.apache.org/jira/browse/HDDS-1787 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Siddharth Wagle >Assignee: Sammi Chen >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > cc: [~xyao] This seems related to the client side topology changes, not sure > if some other Jira is already addressing this. > {code} > 2019-07-10 16:45:53,176 WARN ipc.Server (Server.java:logException(2724)) - > IPC Server handler 14 on 35066, call Call#127037 Retry#0 > org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol.send from 17 > 2.31.116.73:52540 > java.lang.NullPointerException > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.lambda$sortDatanodes$0(ScmBlockLocationProtocolServerSideTranslatorPB.java:215) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.sortDatanodes(ScmBlockLocationProtocolServerSideTranslatorPB.java:215) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:124) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > 2019-07-10 16:45:53,176 WARN om.KeyManagerImpl > (KeyManagerImpl.java:lambda$sortDatanodeInPipeline$7(2129)) - Unable to sort > datanodes based on distance to client, volume=xqoyzocpse, bucket=vxwajaczqh, > key=pool-444-thread-7-201077822, client=127.0.0.1, > datanodes=[10f15723-45d7-4a0c-8f01-8b101744a110{ip: 172.31.116.73, host: > sid-minichaos.gce.cloudera.com, networkLocation: /default-rack, certSerialId: > null}, 7ac2777f-0a5c-4414-9e7f-bfbc47d696ea{ip: 172.31.116.73, host: > sid-minichaos.gce.cloudera.com, networkLocation: /default-rack, certSerialId: > null}], exception=java.lang.NullPointerException > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.lambda$sortDatanodes$0(ScmBlockLocationProtocolServerSideTranslatorPB.java:215) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.sortDatanodes(ScmBlockLocationProtocolServerSideTranslatorPB.java:215) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:124) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) -
[jira] [Comment Edited] (HDDS-1787) NPE thrown while trying to find DN closest to client
[ https://issues.apache.org/jira/browse/HDDS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884825#comment-16884825 ] Sammi Chen edited comment on HDDS-1787 at 7/15/19 9:01 AM: --- Hi [~swagle], I would like to know how to run the MiniOzoneChaos cluster to verify the issue is fixed. TestMiniChaosOzoneCluster cannot reproduce the issue. was (Author: sammi): Hi [~swagle], I would like to know how to run the MiniOzoneChaos cluster to verify the issue is fixed. > NPE thrown while trying to find DN closest to client > > > Key: HDDS-1787 > URL: https://issues.apache.org/jira/browse/HDDS-1787 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Siddharth Wagle >Assignee: Sammi Chen >Priority: Major > > cc: [~xyao] This seems related to the client side topology changes, not sure > if some other Jira is already addressing this. > {code} > 2019-07-10 16:45:53,176 WARN ipc.Server (Server.java:logException(2724)) - > IPC Server handler 14 on 35066, call Call#127037 Retry#0 > org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol.send from 17 > 2.31.116.73:52540 > java.lang.NullPointerException > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.lambda$sortDatanodes$0(ScmBlockLocationProtocolServerSideTranslatorPB.java:215) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.sortDatanodes(ScmBlockLocationProtocolServerSideTranslatorPB.java:215) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:124) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > 2019-07-10 16:45:53,176 WARN om.KeyManagerImpl > (KeyManagerImpl.java:lambda$sortDatanodeInPipeline$7(2129)) - Unable to sort > datanodes based on distance to client, volume=xqoyzocpse, bucket=vxwajaczqh, > key=pool-444-thread-7-201077822, client=127.0.0.1, > datanodes=[10f15723-45d7-4a0c-8f01-8b101744a110{ip: 172.31.116.73, host: > sid-minichaos.gce.cloudera.com, networkLocation: /default-rack, certSerialId: > null}, 7ac2777f-0a5c-4414-9e7f-bfbc47d696ea{ip: 172.31.116.73, host: > sid-minichaos.gce.cloudera.com, networkLocation: /default-rack, certSerialId: > null}], exception=java.lang.NullPointerException > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.lambda$sortDatanodes$0(ScmBlockLocationProtocolServerSideTranslatorPB.java:215) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.sortDatanodes(ScmBlockLocationProtocolServerSideTranslatorPB.java:215) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:124) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGr
[jira] [Commented] (HDDS-1787) NPE thrown while trying to find DN closest to client
[ https://issues.apache.org/jira/browse/HDDS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884825#comment-16884825 ] Sammi Chen commented on HDDS-1787: -- Hi [~swagle], I would like to know how to run the MiniOzoneChaos cluster to verify the issue is fixed. > NPE thrown while trying to find DN closest to client > > > Key: HDDS-1787 > URL: https://issues.apache.org/jira/browse/HDDS-1787 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Siddharth Wagle >Assignee: Sammi Chen >Priority: Major > > cc: [~xyao] This seems related to the client side topology changes, not sure > if some other Jira is already addressing this. > {code} > 2019-07-10 16:45:53,176 WARN ipc.Server (Server.java:logException(2724)) - > IPC Server handler 14 on 35066, call Call#127037 Retry#0 > org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol.send from 17 > 2.31.116.73:52540 > java.lang.NullPointerException > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.lambda$sortDatanodes$0(ScmBlockLocationProtocolServerSideTranslatorPB.java:215) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.sortDatanodes(ScmBlockLocationProtocolServerSideTranslatorPB.java:215) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:124) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > 2019-07-10 16:45:53,176 WARN om.KeyManagerImpl > (KeyManagerImpl.java:lambda$sortDatanodeInPipeline$7(2129)) - Unable to sort > datanodes based on distance to client, volume=xqoyzocpse, bucket=vxwajaczqh, > key=pool-444-thread-7-201077822, client=127.0.0.1, > datanodes=[10f15723-45d7-4a0c-8f01-8b101744a110{ip: 172.31.116.73, host: > sid-minichaos.gce.cloudera.com, networkLocation: /default-rack, certSerialId: > null}, 7ac2777f-0a5c-4414-9e7f-bfbc47d696ea{ip: 172.31.116.73, host: > sid-minichaos.gce.cloudera.com, networkLocation: /default-rack, certSerialId: > null}], exception=java.lang.NullPointerException > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.lambda$sortDatanodes$0(ScmBlockLocationProtocolServerSideTranslatorPB.java:215) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.sortDatanodes(ScmBlockLocationProtocolServerSideTranslatorPB.java:215) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:124) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) --
[jira] [Updated] (HDDS-1553) Add metrics in rack aware container placement policy
[ https://issues.apache.org/jira/browse/HDDS-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1553: - Description: To collect following statistics, 1. total requested datanode count (A) 2. success allocated datanode count without constrain compromise (B) 3. success allocated datanode count with some comstrain compromise (C) B includes C, failed allocation = (A - B) was: To collect following statistics, 1. total requested datanode count (A) 2. success allocated datanode count without constrain compromise (B) 3. success allocated datanode count with some comstrain compromise (C) 4. failed to allocated datanode count (D) A = B + C + D, B includes C > Add metrics in rack aware container placement policy > > > Key: HDDS-1553 > URL: https://issues.apache.org/jira/browse/HDDS-1553 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > > To collect following statistics, > 1. total requested datanode count (A) > 2. success allocated datanode count without constrain compromise (B) > 3. success allocated datanode count with some comstrain compromise (C) > B includes C, failed allocation = (A - B) -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1553) Add metrics in rack aware container placement policy
[ https://issues.apache.org/jira/browse/HDDS-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1553: - Description: To collect following statistics, 1. total requested datanode count (A) 2. success allocated datanode count without constrain compromise (B) 3. success allocated datanode count with some comstrain compromise (C) 4. failed to allocated datanode count (D) A = B + C + D, B includes C was: To collect following statistics, 1. total requested datanode count (A) 2. success allocated datanode count without constrain compromise (B) 3. success allocated datanode count with some comstrain compromise (C) 4. failed to allocated datanode count (D) A = B + C + D > Add metrics in rack aware container placement policy > > > Key: HDDS-1553 > URL: https://issues.apache.org/jira/browse/HDDS-1553 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > > To collect following statistics, > 1. total requested datanode count (A) > 2. success allocated datanode count without constrain compromise (B) > 3. success allocated datanode count with some comstrain compromise (C) > 4. failed to allocated datanode count (D) > A = B + C + D, B includes C -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1663) Add datanode to network topology cluster during node register
Sammi Chen created HDDS-1663: Summary: Add datanode to network topology cluster during node register Key: HDDS-1663 URL: https://issues.apache.org/jira/browse/HDDS-1663 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Sammi Chen Assignee: Sammi Chen -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1662) Missing test resources of integrataion-test project in target directory after compile
[ https://issues.apache.org/jira/browse/HDDS-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-1662: - Issue Type: Sub-task (was: Bug) Parent: HDDS-698 > Missing test resources of integrataion-test project in target directory after > compile > - > > Key: HDDS-1662 > URL: https://issues.apache.org/jira/browse/HDDS-1662 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > > The integration-test project, its origin resources missed in target directory > after compile. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1662) Missing test resources of integrataion-test project in target directory after compile
Sammi Chen created HDDS-1662: Summary: Missing test resources of integrataion-test project in target directory after compile Key: HDDS-1662 URL: https://issues.apache.org/jira/browse/HDDS-1662 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Sammi Chen Assignee: Sammi Chen The integration-test project, its origin resources missed in target directory after compile. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1661) Consolidate hadoop-hdds and hadoop-ozone into hadoop-ozone-project
[ https://issues.apache.org/jira/browse/HDDS-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen reassigned HDDS-1661: Assignee: (was: Sammi Chen) > Consolidate hadoop-hdds and hadoop-ozone into hadoop-ozone-project > -- > > Key: HDDS-1661 > URL: https://issues.apache.org/jira/browse/HDDS-1661 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Eric Yang >Priority: Major > > Ozone source code is some what fragmented in Hadoop source code. The current > code looks like: > {code} > hadoop/pom.ozone.xml > ├── hadoop-hdds > └── hadoop-ozone > {code} > It is helpful to consolidate the project into high level grouping such as: > {code} > hadoop > └── hadoop-ozone-project/pom.xml > └── hadoop-ozone-project/hadoop-hdds > └── hadoop-ozone-project/hadoop-ozone > {code} > This allows user to build ozone from hadoop-ozone-project directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1661) Consolidate hadoop-hdds and hadoop-ozone into hadoop-ozone-project
[ https://issues.apache.org/jira/browse/HDDS-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen reassigned HDDS-1661: Assignee: Sammi Chen (was: Bharat Viswanadham) > Consolidate hadoop-hdds and hadoop-ozone into hadoop-ozone-project > -- > > Key: HDDS-1661 > URL: https://issues.apache.org/jira/browse/HDDS-1661 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Eric Yang >Assignee: Sammi Chen >Priority: Major > > Ozone source code is some what fragmented in Hadoop source code. The current > code looks like: > {code} > hadoop/pom.ozone.xml > ├── hadoop-hdds > └── hadoop-ozone > {code} > It is helpful to consolidate the project into high level grouping such as: > {code} > hadoop > └── hadoop-ozone-project/pom.xml > └── hadoop-ozone-project/hadoop-hdds > └── hadoop-ozone-project/hadoop-ozone > {code} > This allows user to build ozone from hadoop-ozone-project directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1653) Add option to "ozone scmcli printTopology" to order the output acccording to topology layer
Sammi Chen created HDDS-1653: Summary: Add option to "ozone scmcli printTopology" to order the output acccording to topology layer Key: HDDS-1653 URL: https://issues.apache.org/jira/browse/HDDS-1653 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Sammi Chen Add option to order the output acccording to topology layer. For example, for /rack/node topolgy, we can show, State = HEALTHY /default-rack: ozone_datanode_1.ozone_default/172.18.0.3 ozone_datanode_2.ozone_default/172.18.0.2 ozone_datanode_3.ozone_default/172.18.0.4 /rack1: ozone_datanode_4.ozone_default/172.18.0.5 ozone_datanode_5.ozone_default/172.18.0.6 For /dc/rack/node topology, we can either show State = HEALTHY /default-dc/default-rack: ozone_datanode_1.ozone_default/172.18.0.3 ozone_datanode_2.ozone_default/172.18.0.2 ozone_datanode_3.ozone_default/172.18.0.4 /dc1/rack1: ozone_datanode_4.ozone_default/172.18.0.5 ozone_datanode_5.ozone_default/172.18.0.6 or State = HEALTHY default-dc: default-rack: ozone_datanode_1.ozone_default/172.18.0.3 ozone_datanode_2.ozone_default/172.18.0.2 ozone_datanode_3.ozone_default/172.18.0.4 dc1: rack1: ozone_datanode_4.ozone_default/172.18.0.5 ozone_datanode_5.ozone_default/172.18.0.6 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org