[jira] [Commented] (HDFS-13369) FSCK Report broken with RequestHedgingProxyProvider

2022-10-12 Thread Sammi Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616774#comment-17616774
 ] 

Sammi Chen commented on HDFS-13369:
---

Cherry-picked from trunk to branch-3.3.5.

> FSCK Report broken with RequestHedgingProxyProvider 
> 
>
> Key: HDFS-13369
> URL: https://issues.apache.org/jira/browse/HDFS-13369
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.3, 3.0.0, 3.1.0
>Reporter: Harshakiran Reddy
>Assignee: Ranith Sardar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.5
>
> Attachments: HDFS-13369.001.patch, HDFS-13369.002.patch, 
> HDFS-13369.003.patch, HDFS-13369.004.patch, HDFS-13369.005.patch, 
> HDFS-13369.006.patch, HDFS-13369.007.patch
>
>
> Scenario:-
> 1.Configure the RequestHedgingProxy
> 2. write some files in file system
> 3. Take FSCK report for the above files
>  
> {noformat}
> bin> hdfs fsck /file1 -locations -files -blocks
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider$RequestHedgingInvocationHandler
>  cannot be cast to org.apache.hadoop.ipc.RpcInvocationHandler
> at org.apache.hadoop.ipc.RPC.getConnectionIdForProxy(RPC.java:626)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.getConnectionId(RetryInvocationHandler.java:438)
> at org.apache.hadoop.ipc.RPC.getConnectionIdForProxy(RPC.java:628)
> at org.apache.hadoop.ipc.RPC.getServerAddress(RPC.java:611)
> at org.apache.hadoop.hdfs.HAUtil.getAddressOfActive(HAUtil.java:263)
> at 
> org.apache.hadoop.hdfs.tools.DFSck.getCurrentNamenodeAddress(DFSck.java:257)
> at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:319)
> at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
> at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:156)
> at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:153)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:152)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
> at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:385){noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13369) FSCK Report broken with RequestHedgingProxyProvider

2022-10-12 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDFS-13369:
--
Fix Version/s: 3.3.5
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> FSCK Report broken with RequestHedgingProxyProvider 
> 
>
> Key: HDFS-13369
> URL: https://issues.apache.org/jira/browse/HDFS-13369
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.3, 3.0.0, 3.1.0
>Reporter: Harshakiran Reddy
>Assignee: Ranith Sardar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.5
>
> Attachments: HDFS-13369.001.patch, HDFS-13369.002.patch, 
> HDFS-13369.003.patch, HDFS-13369.004.patch, HDFS-13369.005.patch, 
> HDFS-13369.006.patch, HDFS-13369.007.patch
>
>
> Scenario:-
> 1.Configure the RequestHedgingProxy
> 2. write some files in file system
> 3. Take FSCK report for the above files
>  
> {noformat}
> bin> hdfs fsck /file1 -locations -files -blocks
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider$RequestHedgingInvocationHandler
>  cannot be cast to org.apache.hadoop.ipc.RpcInvocationHandler
> at org.apache.hadoop.ipc.RPC.getConnectionIdForProxy(RPC.java:626)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.getConnectionId(RetryInvocationHandler.java:438)
> at org.apache.hadoop.ipc.RPC.getConnectionIdForProxy(RPC.java:628)
> at org.apache.hadoop.ipc.RPC.getServerAddress(RPC.java:611)
> at org.apache.hadoop.hdfs.HAUtil.getAddressOfActive(HAUtil.java:263)
> at 
> org.apache.hadoop.hdfs.tools.DFSck.getCurrentNamenodeAddress(DFSck.java:257)
> at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:319)
> at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
> at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:156)
> at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:153)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:152)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
> at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:385){noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13369) FSCK Report broken with RequestHedgingProxyProvider

2022-10-09 Thread Sammi Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614866#comment-17614866
 ] 

Sammi Chen commented on HDFS-13369:
---

Hi @navinko, could you submit same PR for branch   2.8.0, 3.0.0, 3.1.0 too? 

> FSCK Report broken with RequestHedgingProxyProvider 
> 
>
> Key: HDFS-13369
> URL: https://issues.apache.org/jira/browse/HDFS-13369
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.3, 3.0.0, 3.1.0
>Reporter: Harshakiran Reddy
>Assignee: Ranith Sardar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13369.001.patch, HDFS-13369.002.patch, 
> HDFS-13369.003.patch, HDFS-13369.004.patch, HDFS-13369.005.patch, 
> HDFS-13369.006.patch, HDFS-13369.007.patch
>
>
> Scenario:-
> 1.Configure the RequestHedgingProxy
> 2. write some files in file system
> 3. Take FSCK report for the above files
>  
> {noformat}
> bin> hdfs fsck /file1 -locations -files -blocks
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider$RequestHedgingInvocationHandler
>  cannot be cast to org.apache.hadoop.ipc.RpcInvocationHandler
> at org.apache.hadoop.ipc.RPC.getConnectionIdForProxy(RPC.java:626)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.getConnectionId(RetryInvocationHandler.java:438)
> at org.apache.hadoop.ipc.RPC.getConnectionIdForProxy(RPC.java:628)
> at org.apache.hadoop.ipc.RPC.getServerAddress(RPC.java:611)
> at org.apache.hadoop.hdfs.HAUtil.getAddressOfActive(HAUtil.java:263)
> at 
> org.apache.hadoop.hdfs.tools.DFSck.getCurrentNamenodeAddress(DFSck.java:257)
> at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:319)
> at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
> at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:156)
> at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:153)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:152)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
> at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:385){noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2602) Add a property to enable/disable ONE replica pipeline auto creation in SCMPipelineManager

2019-11-21 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen reassigned HDDS-2602:


Assignee: Li Cheng

> Add a property to enable/disable ONE replica pipeline auto creation in 
> SCMPipelineManager
> -
>
> Key: HDDS-2602
> URL: https://issues.apache.org/jira/browse/HDDS-2602
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Sammi Chen
>Assignee: Li Cheng
>Priority: Major
>
> ONE RATIS replica not favored in production cluster. Add a property to 
> disable automatically create ONE RATIS pipeline in SCMPipelineManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2602) Add a property to enable/disable ONE replica pipeline auto creation in SCMPipelineManager

2019-11-21 Thread Sammi Chen (Jira)
Sammi Chen created HDDS-2602:


 Summary: Add a property to enable/disable ONE replica pipeline 
auto creation in SCMPipelineManager
 Key: HDDS-2602
 URL: https://issues.apache.org/jira/browse/HDDS-2602
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Sammi Chen


ONE RATIS replica not favored in production cluster. Add a property to disable 
automatically create ONE RATIS pipeline in SCMPipelineManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2540) Fix accepetance test failure introduced by wait_for_safemode_exit

2019-11-18 Thread Sammi Chen (Jira)
Sammi Chen created HDDS-2540:


 Summary: Fix accepetance test failure introduced by 
wait_for_safemode_exit
 Key: HDDS-2540
 URL: https://issues.apache.org/jira/browse/HDDS-2540
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Sammi Chen
Assignee: Sammi Chen


https://github.com/apache/hadoop-ozone/blob/1b72718dcab7f83ebdac67b6242c729f03a8f103/hadoop-ozone/dist/src/main/compose/testlib.sh#L97

- status=`docker-compose -f "${compose_file}" exec -T scm bash -c 
"kinit -k HTTP/s...@example.com -t /etc/security/keytabs/HTTP.keytab && 
$command'"`
+ status=`docker-compose -f "${compose_file}" exec -T scm bash -c 
"kinit -k HTTP/s...@example.com -t /etc/security/keytabs/HTTP.keytab && 
$command"`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2499) IsLeader information is lost when update pipeline state

2019-11-18 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen resolved HDDS-2499.
--
Fix Version/s: 0.5.0
   Resolution: Fixed

> IsLeader information is lost when update pipeline state
> ---
>
> Key: HDDS-2499
> URL: https://issues.apache.org/jira/browse/HDDS-2499
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2499) IsLeader information is lost when update pipeline state

2019-11-14 Thread Sammi Chen (Jira)
Sammi Chen created HDDS-2499:


 Summary: IsLeader information is lost when update pipeline state
 Key: HDDS-2499
 URL: https://issues.apache.org/jira/browse/HDDS-2499
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Sammi Chen
Assignee: Sammi Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2249) SortDatanodes does not return correct orders when many DNs on a given host

2019-11-14 Thread Sammi Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974047#comment-16974047
 ] 

Sammi Chen edited comment on HDDS-2249 at 11/14/19 8:47 AM:


Thanks [~swagle] for report this.  One idea comes to my mind is how about use 
the the hostname:port as the key in dnsToUuidMap.  If it works, it might solve 
this issue. 


was (Author: sammi):
Thanks [~swagle] for report this.  One idea comes to my mind is how about use 
the the hostname:port as the key in dnsToUuidMap.  If it works, will it solve 
this issue? 

> SortDatanodes does not return correct orders when many DNs on a given host
> --
>
> Key: HDDS-2249
> URL: https://issues.apache.org/jira/browse/HDDS-2249
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Priority: Major
>
> In HDDS-2199 ScmNodeManager.getNodeByAddress() was changed to return a list 
> of nodes rather than a single entry, to handle the case where many datanodes 
> are running on the same host.
> In SCMBlocKProtocol.sortDatanodes(), it uses the results returned from 
> getNodesByAddress to determine if the client submitting the request is 
> running on a cluster node, and if it is, it attempts to sort the datanodes by 
> distance from the client machine.
> To do this, the code currently takes the first DatanodeDetails object 
> returned by getHostsByAddress and then compares it with the other passed in 
> nodes. If any of the passed nodes are equal to the client node (based on the 
> Java object ID) it returns a zero distance, otherwise the distance is 
> calculated.
> The sort is performed in NetworkTopologyImpl.sortByDistanceCost() which later 
> calls NetworkTopologyImpl.getDistanceCost() which is where the object 
> comparison is performed:
> {code}
> if ((node1 != null && node2 != null && node1.equals(node2)) ||
>  (node1 == null && node2 == null)) {
>  return 0;
> }
> {code}
> This does not always work when there are many datanodes on the same host, as 
> the first node returned from getNodesByAddress() is guarantted to be on the 
> same host as the client, but the list of passed datanodes may not include 
> that datanode instance.
> To fix this, we should probably have getDistanceCost() compare hostnames or 
> IP as a second check or instead of the object equality, however this is not 
> trivial to implement.
> The reason, is that getDistanceCost() takes Node objects (not 
> DatanodeDetails) and a Node does not have a IP or Hostname field. It does 
> have a getNetworkName method, which should return the hostname, but it is 
> overwritten by the hosts UUID when it registed to the node manager, by this 
> line in NodeManager.register():
> datanodeDetails.setNetworkName(datanodeDetails.getUuidString());
>  
> Note this only affects test clusters where many DNs are on a single host, and 
> it does not cause any failures. The DNs may be returned a less than ideal 
> order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2249) SortDatanodes does not return correct orders when many DNs on a given host

2019-11-14 Thread Sammi Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974047#comment-16974047
 ] 

Sammi Chen commented on HDDS-2249:
--

Thanks [~swagle] for report this.  One idea comes to my mind is how about use 
the the hostname:port as the key in dnsToUuidMap.  If it works, will it solve 
this issue? 

> SortDatanodes does not return correct orders when many DNs on a given host
> --
>
> Key: HDDS-2249
> URL: https://issues.apache.org/jira/browse/HDDS-2249
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Priority: Major
>
> In HDDS-2199 ScmNodeManager.getNodeByAddress() was changed to return a list 
> of nodes rather than a single entry, to handle the case where many datanodes 
> are running on the same host.
> In SCMBlocKProtocol.sortDatanodes(), it uses the results returned from 
> getNodesByAddress to determine if the client submitting the request is 
> running on a cluster node, and if it is, it attempts to sort the datanodes by 
> distance from the client machine.
> To do this, the code currently takes the first DatanodeDetails object 
> returned by getHostsByAddress and then compares it with the other passed in 
> nodes. If any of the passed nodes are equal to the client node (based on the 
> Java object ID) it returns a zero distance, otherwise the distance is 
> calculated.
> The sort is performed in NetworkTopologyImpl.sortByDistanceCost() which later 
> calls NetworkTopologyImpl.getDistanceCost() which is where the object 
> comparison is performed:
> {code}
> if ((node1 != null && node2 != null && node1.equals(node2)) ||
>  (node1 == null && node2 == null)) {
>  return 0;
> }
> {code}
> This does not always work when there are many datanodes on the same host, as 
> the first node returned from getNodesByAddress() is guarantted to be on the 
> same host as the client, but the list of passed datanodes may not include 
> that datanode instance.
> To fix this, we should probably have getDistanceCost() compare hostnames or 
> IP as a second check or instead of the object equality, however this is not 
> trivial to implement.
> The reason, is that getDistanceCost() takes Node objects (not 
> DatanodeDetails) and a Node does not have a IP or Hostname field. It does 
> have a getNetworkName method, which should return the hostname, but it is 
> overwritten by the hosts UUID when it registed to the node manager, by this 
> line in NodeManager.register():
> datanodeDetails.setNetworkName(datanodeDetails.getUuidString());
>  
> Note this only affects test clusters where many DNs are on a single host, and 
> it does not cause any failures. The DNs may be returned a less than ideal 
> order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-10 Thread Sammi Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971357#comment-16971357
 ] 

Sammi Chen commented on HDDS-2356:
--

Hi [~bharat],  thanks for helping fix the multi-upload issues.  Li and I am 
working on enable Ozone in Tencent's production environment.  Currenlty we have 
two main blocking issues,  one is this multi-upload, another is performance.  
Mukul and Shashi are helping us with the performance improvement.  This 
multi-upload issue is consistenly happen in our environement with big file with 
size, say 5GB.  It will be more efficient if you would try to reproduce the 
case locally.  We would love to assit if you need any reproduce help. 


> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, 
> image-2019-10-31-18-56-56-177.png, om_audit_log_plc_1570863541668_9278.txt
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> Updated on 11/06/2019:
> See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs 
> are in the attachment.
>  2019-11-05 18:12:37,766 ERROR 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest:
>  MultipartUpload Commit is failed for Key:./2
> 0191012/plc_1570863541668_9278 in Volume/Bucket 
> s325d55ad283aa400af464c76d713c07ad/ozone-test
> NO_SUCH_MULTIPART_UPLOAD_ERROR 
> org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload 
> is with specified uploadId fcda8608-b431-48b7-8386-
> 0a332f1a709a-103084683261641950
> at 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1
> 56)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.
> java:217)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132)
> at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100)
> at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>  
> Updated on 10/28/2019:
> See MISMATCH_MULTIPART_LIST error.
>  
> 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete 
> Multipart Upload Request for bucket: ozone-test, key: 
> 20191012/plc_1570863541668_927
>  8
>  MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: 
> Complete Multipart Upload Failed: volume: 
> s3c89e813c80ffcea9543004d57b2a1239bucket:
>  ozone-testkey: 20191012/plc_1570863541668_9278
>  at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732)
>  at 
> 

[jira] [Updated] (HDDS-1576) Support configure more than one raft log storage to host multiple pipelines

2019-11-04 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1576:
-
Summary: Support configure more than one raft log storage to host multiple 
pipelines  (was: Support configure more than one raft log storage to host 
multiple pipeline)

> Support configure more than one raft log storage to host multiple pipelines
> ---
>
> Key: HDDS-1576
> URL: https://issues.apache.org/jira/browse/HDDS-1576
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode, SCM
>Reporter: Siddharth Wagle
>Assignee: Sammi Chen
>Priority: Major
>
> Support configure multiple raft log storage to host multiple THREE factor 
> RATIS pipelines. 
> Unless the storage is a fast media, datanode should try best to allocate 
> different raft log storage for new pipeline. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1576) Support configure more than one raft log storage to host multiple pipeline

2019-11-04 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1576:
-
Description: 
Support configure multiple raft log storage to host multiple THREE factor RATIS 
pipelines. 
Unless the storage is a fast media, datanode should try best to allocate 
different raft log storage for new pipeline. 


  was:
Support configure multiple raft 
SCM should not try to create a raft group by placing the raft log on a disk 
that is already used by existing Ratis ring for an open pipeline.

This constraint would have to be applied by either throwing an exception during 
pipeline creation or by looking at configs on the SCM side.


Ensure constraint of one raft log per disk is met unless fast media


> Support configure more than one raft log storage to host multiple pipeline
> --
>
> Key: HDDS-1576
> URL: https://issues.apache.org/jira/browse/HDDS-1576
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode, SCM
>Reporter: Siddharth Wagle
>Assignee: Sammi Chen
>Priority: Major
>
> Support configure multiple raft log storage to host multiple THREE factor 
> RATIS pipelines. 
> Unless the storage is a fast media, datanode should try best to allocate 
> different raft log storage for new pipeline. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1576) Support configure more than one raft storage to host multiple pipeline

2019-11-04 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1576:
-
Description: 
Support configure multiple raft 
SCM should not try to create a raft group by placing the raft log on a disk 
that is already used by existing Ratis ring for an open pipeline.

This constraint would have to be applied by either throwing an exception during 
pipeline creation or by looking at configs on the SCM side.


Ensure constraint of one raft log per disk is met unless fast media

  was:
SCM should not try to create a raft group by placing the raft log on a disk 
that is already used by existing Ratis ring for an open pipeline.

This constraint would have to be applied by either throwing an exception during 
pipeline creation or by looking at configs on the SCM side.


Ensure constraint of one raft log per disk is met unless fast media


> Support configure more than one raft storage to host multiple pipeline
> --
>
> Key: HDDS-1576
> URL: https://issues.apache.org/jira/browse/HDDS-1576
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode, SCM
>Reporter: Siddharth Wagle
>Assignee: Sammi Chen
>Priority: Major
>
> Support configure multiple raft 
> SCM should not try to create a raft group by placing the raft log on a disk 
> that is already used by existing Ratis ring for an open pipeline.
> This constraint would have to be applied by either throwing an exception 
> during pipeline creation or by looking at configs on the SCM side.
> Ensure constraint of one raft log per disk is met unless fast media



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1576) Support configure more than one raft log storage to host multiple pipeline

2019-11-04 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1576:
-
Summary: Support configure more than one raft log storage to host multiple 
pipeline  (was: Support configure more than one raft storage to host multiple 
pipeline)

> Support configure more than one raft log storage to host multiple pipeline
> --
>
> Key: HDDS-1576
> URL: https://issues.apache.org/jira/browse/HDDS-1576
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode, SCM
>Reporter: Siddharth Wagle
>Assignee: Sammi Chen
>Priority: Major
>
> Support configure multiple raft 
> SCM should not try to create a raft group by placing the raft log on a disk 
> that is already used by existing Ratis ring for an open pipeline.
> This constraint would have to be applied by either throwing an exception 
> during pipeline creation or by looking at configs on the SCM side.
> Ensure constraint of one raft log per disk is met unless fast media



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1576) Support configure more than one raft storage to host multiple pipeline

2019-11-04 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1576:
-
Summary: Support configure more than one raft storage to host multiple 
pipeline  (was: Ensure constraint of one raft log per disk is met unless fast 
media)

> Support configure more than one raft storage to host multiple pipeline
> --
>
> Key: HDDS-1576
> URL: https://issues.apache.org/jira/browse/HDDS-1576
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode, SCM
>Reporter: Siddharth Wagle
>Assignee: Sammi Chen
>Priority: Major
>
> SCM should not try to create a raft group by placing the raft log on a disk 
> that is already used by existing Ratis ring for an open pipeline.
> This constraint would have to be applied by either throwing an exception 
> during pipeline creation or by looking at configs on the SCM side.
> Ensure constraint of one raft log per disk is met unless fast media



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1576) Ensure constraint of one raft log per disk is met unless fast media

2019-11-04 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1576:
-
Description: 
SCM should not try to create a raft group by placing the raft log on a disk 
that is already used by existing Ratis ring for an open pipeline.

This constraint would have to be applied by either throwing an exception during 
pipeline creation or by looking at configs on the SCM side.


Ensure constraint of one raft log per disk is met unless fast media

  was:
SCM should not try to create a raft group by placing the raft log on a disk 
that is already used by existing Ratis ring for an open pipeline.

This constraint would have to be applied by either throwing an exception during 
pipeline creation or by looking at configs on the SCM side.


> Ensure constraint of one raft log per disk is met unless fast media
> ---
>
> Key: HDDS-1576
> URL: https://issues.apache.org/jira/browse/HDDS-1576
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode, SCM
>Reporter: Siddharth Wagle
>Assignee: Sammi Chen
>Priority: Major
>
> SCM should not try to create a raft group by placing the raft log on a disk 
> that is already used by existing Ratis ring for an open pipeline.
> This constraint would have to be applied by either throwing an exception 
> during pipeline creation or by looking at configs on the SCM side.
> Ensure constraint of one raft log per disk is met unless fast media



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1576) Ensure constraint of one raft log per disk is met unless fast media

2019-11-01 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen reassigned HDDS-1576:


Assignee: Sammi Chen  (was: Li Cheng)

> Ensure constraint of one raft log per disk is met unless fast media
> ---
>
> Key: HDDS-1576
> URL: https://issues.apache.org/jira/browse/HDDS-1576
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode, SCM
>Reporter: Siddharth Wagle
>Assignee: Sammi Chen
>Priority: Major
>
> SCM should not try to create a raft group by placing the raft log on a disk 
> that is already used by existing Ratis ring for an open pipeline.
> This constraint would have to be applied by either throwing an exception 
> during pipeline creation or by looking at configs on the SCM side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2376) Fail to read data through XceiverClientGrpc

2019-11-01 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen resolved HDDS-2376.
--
Resolution: Not A Bug

> Fail to read data through XceiverClientGrpc
> ---
>
> Key: HDDS-2376
> URL: https://issues.apache.org/jira/browse/HDDS-2376
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sammi Chen
>Assignee: Hanisha Koneru
>Priority: Blocker
>
> Run teragen, application failed with following stack, 
> 19/10/29 14:35:42 INFO mapreduce.Job: Running job: job_1567133159094_0048
> 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 running in 
> uber mode : false
> 19/10/29 14:35:59 INFO mapreduce.Job:  map 0% reduce 0%
> 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 failed with 
> state FAILED due to: Application application_1567133159094_0048 failed 2 
> times due to AM Container for appattempt_1567133159094_0048_02 exited 
> with  exitCode: -1000
> For more detailed output, check application tracking 
> page:http://host183:8088/cluster/app/application_1567133159094_0048Then, 
> click on links to logs of each attempt.
> Diagnostics: Unexpected OzoneException: 
> org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
> index 0
> java.io.IOException: Unexpected OzoneException: 
> org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
> index 0
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239)
>   at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52)
>   at java.io.DataInputStream.read(DataInputStream.java:100)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:86)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:60)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:120)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
>   at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
>   at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum 
> mismatch at index 0
>   at 
> org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148)
>   at 
> org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275)
>   at 
> org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:250)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:233)
>   at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:245)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:335)
>   ... 26 more
> Caused by: Checksum mismatch at index 0
> org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
> index 0
>   at 
> 

[jira] [Comment Edited] (HDDS-2376) Fail to read data through XceiverClientGrpc

2019-11-01 Thread Sammi Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964661#comment-16964661
 ] 

Sammi Chen edited comment on HDDS-2376 at 11/1/19 6:56 AM:
---

The root cause is I didn't retart Hadoop 2.7.5 after I deploied the latest 
Ozone binary.  So the Hadoop still use an old version Ozone client(2 month 
before) . This OzoneChecksumException is thrown out by NodeManager.  Logs 
attached.  It seems something is changed in Ozone server side, which makes an 
old version Ozone client cann't verify the data written by itself. 

[~msingh] and [~hanishakoneru], thanks for pay attention to this issue. I will 
close it now. 

2019-11-01 11:46:02,230 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc: 
Failed to execute command cmdType: ReadChunk
traceID: ""
containerID: 1145
datanodeUuid: "ed90869c-317e-4303-8922-9fa83a3983cb"
readChunk {
  blockID {
containerID: 1145
localID: 103060600027086850
blockCommitSequenceId: 948
  }
  chunkData {
chunkName: "103060600027086850_chunk_1"
offset: 0
len: 245
checksumData {
  type: CRC32
  bytesPerChecksum: 1048576
  checksums: "\247\304Yf"
}
  }
}
 on datanode 1da74a1d-f64d-4ad4-b04c-85f26687e683
org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
index 0
at 
org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148)
at 
org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275)
at 
org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375)
at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287)
at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:250)
at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:233)
at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:245)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:335)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144)
at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239)
at 
org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
at 
org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:86)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:60)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:120)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-11-01 11:46:02,243 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc: 
Failed to execute command cmdType: ReadChunk
traceID: ""
containerID: 1145
datanodeUuid: "ed90869c-317e-4303-8922-9fa83a3983cb"
readChunk {
  blockID {
containerID: 1145
localID: 103060600027086850
blockCommitSequenceId: 948
  }
  chunkData {
chunkName: "103060600027086850_chunk_1"
offset: 0
len: 245
checksumData {
  type: CRC32
  bytesPerChecksum: 1048576
  checksums: "\247\304Yf"
}
  }
}
 on datanode ed90869c-317e-4303-8922-9fa83a3983cb
org.apache.hadoop.ozone.common.OzoneChecksumException: 

[jira] [Commented] (HDDS-2376) Fail to read data through XceiverClientGrpc

2019-11-01 Thread Sammi Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964661#comment-16964661
 ] 

Sammi Chen commented on HDDS-2376:
--

The root cause is I didn't retart Hadoop 2.7.5 after I deploied the latest 
Ozone binary.  So the Hadoop still use an old version Ozone client(2 month 
before) . This OzoneChecksumException is thrown out by NodeManager.  Logs 
attached.  It seems something is changed in Ozone server side, which makes an 
old version Ozone client cann't verify the data written by itself. 


2019-11-01 11:46:02,230 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc: 
Failed to execute command cmdType: ReadChunk
traceID: ""
containerID: 1145
datanodeUuid: "ed90869c-317e-4303-8922-9fa83a3983cb"
readChunk {
  blockID {
containerID: 1145
localID: 103060600027086850
blockCommitSequenceId: 948
  }
  chunkData {
chunkName: "103060600027086850_chunk_1"
offset: 0
len: 245
checksumData {
  type: CRC32
  bytesPerChecksum: 1048576
  checksums: "\247\304Yf"
}
  }
}
 on datanode 1da74a1d-f64d-4ad4-b04c-85f26687e683
org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
index 0
at 
org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148)
at 
org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275)
at 
org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375)
at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287)
at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:250)
at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:233)
at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:245)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:335)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144)
at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239)
at 
org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
at 
org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:86)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:60)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:120)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-11-01 11:46:02,243 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc: 
Failed to execute command cmdType: ReadChunk
traceID: ""
containerID: 1145
datanodeUuid: "ed90869c-317e-4303-8922-9fa83a3983cb"
readChunk {
  blockID {
containerID: 1145
localID: 103060600027086850
blockCommitSequenceId: 948
  }
  chunkData {
chunkName: "103060600027086850_chunk_1"
offset: 0
len: 245
checksumData {
  type: CRC32
  bytesPerChecksum: 1048576
  checksums: "\247\304Yf"
}
  }
}
 on datanode ed90869c-317e-4303-8922-9fa83a3983cb
org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
index 0
at 
org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148)

[jira] [Resolved] (HDDS-2363) Failed to create Ratis container

2019-10-31 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen resolved HDDS-2363.
--
Resolution: Fixed

> Failed to create Ratis container
> 
>
> Key: HDDS-2363
> URL: https://issues.apache.org/jira/browse/HDDS-2363
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Error logs;
> 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR  - 
> org.rocksdb.RocksDBException Failed init RocksDB, db path : 
> /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db,
>  exception 
> :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db:
>  does not exist (create_if_missing is false)
> CACHED_OPTS is a RockDB options cache in MetadataStoreBuilder.  The cache 
> keeps the old rocksdb options which is not refreshed with new option values 
> at new call. 
> Logs as following didn't reveal the true failure of write failure.  Will 
> improve following logs too. 
> 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
> CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
> CONTAINER_INTERNAL_ERROR
> 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk 
> : Trace ID:  : Message: ContainerID 402 creation failed : Result: 
> CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2363) Failed to create Ratis container

2019-10-31 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2363:
-
Fix Version/s: 0.5.0

> Failed to create Ratis container
> 
>
> Key: HDDS-2363
> URL: https://issues.apache.org/jira/browse/HDDS-2363
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Error logs;
> 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR  - 
> org.rocksdb.RocksDBException Failed init RocksDB, db path : 
> /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db,
>  exception 
> :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db:
>  does not exist (create_if_missing is false)
> CACHED_OPTS is a RockDB options cache in MetadataStoreBuilder.  The cache 
> keeps the old rocksdb options which is not refreshed with new option values 
> at new call. 
> Logs as following didn't reveal the true failure of write failure.  Will 
> improve following logs too. 
> 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
> CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
> CONTAINER_INTERNAL_ERROR
> 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk 
> : Trace ID:  : Message: ContainerID 402 creation failed : Result: 
> CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2363) Failed to create Ratis container

2019-10-31 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2363:
-
Summary: Failed to create Ratis container  (was: Fail to create Ratis 
container)

> Failed to create Ratis container
> 
>
> Key: HDDS-2363
> URL: https://issues.apache.org/jira/browse/HDDS-2363
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Error logs;
> 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR  - 
> org.rocksdb.RocksDBException Failed init RocksDB, db path : 
> /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db,
>  exception 
> :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db:
>  does not exist (create_if_missing is false)
> CACHED_OPTS is a RockDB options cache in MetadataStoreBuilder.  The cache 
> keeps the old rocksdb options which is not refreshed with new option values 
> at new call. 
> Logs as following didn't reveal the true failure of write failure.  Will 
> improve following logs too. 
> 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
> CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
> CONTAINER_INTERNAL_ERROR
> 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk 
> : Trace ID:  : Message: ContainerID 402 creation failed : Result: 
> CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2376) Fail to read data through XceiverClientGrpc

2019-10-31 Thread Sammi Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963727#comment-16963727
 ] 

Sammi Chen commented on HDDS-2376:
--

Hi [~msingh] and [~hanishakoneru],  I don't find any WARN or ERROR logs on om, 
scm and datanodes.  I will add more logs to collect more info. 

> Fail to read data through XceiverClientGrpc
> ---
>
> Key: HDDS-2376
> URL: https://issues.apache.org/jira/browse/HDDS-2376
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sammi Chen
>Assignee: Hanisha Koneru
>Priority: Blocker
>
> Run teragen, application failed with following stack, 
> 19/10/29 14:35:42 INFO mapreduce.Job: Running job: job_1567133159094_0048
> 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 running in 
> uber mode : false
> 19/10/29 14:35:59 INFO mapreduce.Job:  map 0% reduce 0%
> 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 failed with 
> state FAILED due to: Application application_1567133159094_0048 failed 2 
> times due to AM Container for appattempt_1567133159094_0048_02 exited 
> with  exitCode: -1000
> For more detailed output, check application tracking 
> page:http://host183:8088/cluster/app/application_1567133159094_0048Then, 
> click on links to logs of each attempt.
> Diagnostics: Unexpected OzoneException: 
> org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
> index 0
> java.io.IOException: Unexpected OzoneException: 
> org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
> index 0
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239)
>   at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52)
>   at java.io.DataInputStream.read(DataInputStream.java:100)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:86)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:60)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:120)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
>   at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
>   at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum 
> mismatch at index 0
>   at 
> org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148)
>   at 
> org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275)
>   at 
> org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:250)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:233)
>   at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:245)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:335)
>   ... 26 more
> 

[jira] [Updated] (HDDS-2376) Fail to read data through XceiverClientGrpc

2019-10-29 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2376:
-
Description: 
Run teragen, application failed with following stack, 

19/10/29 14:35:42 INFO mapreduce.Job: Running job: job_1567133159094_0048
19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 running in 
uber mode : false
19/10/29 14:35:59 INFO mapreduce.Job:  map 0% reduce 0%
19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 failed with 
state FAILED due to: Application application_1567133159094_0048 failed 2 times 
due to AM Container for appattempt_1567133159094_0048_02 exited with  
exitCode: -1000
For more detailed output, check application tracking 
page:http://host183:8088/cluster/app/application_1567133159094_0048Then, click 
on links to logs of each attempt.
Diagnostics: Unexpected OzoneException: 
org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
index 0
java.io.IOException: Unexpected OzoneException: 
org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
index 0
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144)
at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239)
at 
org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
at 
org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:86)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:60)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:120)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum 
mismatch at index 0
at 
org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148)
at 
org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275)
at 
org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375)
at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287)
at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:250)
at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:233)
at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:245)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:335)
... 26 more
Caused by: Checksum mismatch at index 0
org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
index 0
at 
org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148)
at 
org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275)
at 
org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375)
at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287)
at 

[jira] [Created] (HDDS-2376) Fail to read data through XceiverClientGrpc

2019-10-29 Thread Sammi Chen (Jira)
Sammi Chen created HDDS-2376:


 Summary: Fail to read data through XceiverClientGrpc
 Key: HDDS-2376
 URL: https://issues.apache.org/jira/browse/HDDS-2376
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Sammi Chen


Run teragen, application failed with following stack, 

19/10/29 14:35:42 INFO mapreduce.Job: Running job: job_1567133159094_0048
19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 running in 
uber mode : false
19/10/29 14:35:59 INFO mapreduce.Job:  map 0% reduce 0%
19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 failed with 
state FAILED due to: Application application_1567133159094_0048 failed 2 times 
due to AM Container for appattempt_1567133159094_0048_02 exited with  
exitCode: -1000
For more detailed output, check application tracking 
page:http://host183:8088/cluster/app/application_1567133159094_0048Then, click 
on links to logs of each attempt.
Diagnostics: Unexpected OzoneException: 
org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
index 0
java.io.IOException: Unexpected OzoneException: 
org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
index 0
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144)
at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239)
at 
org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
at 
org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:86)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:60)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:120)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum 
mismatch at index 0
at 
org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148)
at 
org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275)
at 
org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375)
at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287)
at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:250)
at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:233)
at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:245)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:335)
... 26 more
Caused by: Checksum mismatch at index 0
org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
index 0
at 
org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148)
at 
org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275)
at 
org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238)
at 

[jira] [Updated] (HDDS-2363) Fail to create Ratis container

2019-10-29 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2363:
-
Priority: Blocker  (was: Critical)

> Fail to create Ratis container
> --
>
> Key: HDDS-2363
> URL: https://issues.apache.org/jira/browse/HDDS-2363
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Error logs;
> 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR  - 
> org.rocksdb.RocksDBException Failed init RocksDB, db path : 
> /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db,
>  exception 
> :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db:
>  does not exist (create_if_missing is false)
> CACHED_OPTS is a RockDB options cache in MetadataStoreBuilder.  The cache 
> keeps the old rocksdb options which is not refreshed with new option values 
> at new call. 
> Logs as following didn't reveal the true failure of write failure.  Will 
> improve following logs too. 
> 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
> CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
> CONTAINER_INTERNAL_ERROR
> 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk 
> : Trace ID:  : Message: ContainerID 402 creation failed : Result: 
> CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2363) Fail to create Ratis container

2019-10-29 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2363:
-
Description: 
Error logs;
2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR  - 
org.rocksdb.RocksDBException Failed init RocksDB, db path : 
/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db,
 exception 
:/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db:
 does not exist (create_if_missing is false)

CACHED_OPTS is a RockDB options cache in MetadataStoreBuilder.  The cache keeps 
the old rocksdb options which is not refreshed with new option values at new 
call. 

Logs as following didn't reveal the true failure of write failure.  Will 
improve following logs too. 

2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
CONTAINER_INTERNAL_ERROR
2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk : 
Trace ID:  : Message: ContainerID 402 creation failed : Result: 
CONTAINER_INTERNAL_ERROR

  was:
Error logs;
2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR  - 
org.rocksdb.RocksDBException Failed init RocksDB, db path : 
/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db,
 exception 
:/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db:
 does not exist (create_if_missing is false)

Logs as following didn't reveal the true failure of write failure.  Will 
improve following logs too. 

2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
CONTAINER_INTERNAL_ERROR
2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk : 
Trace ID:  : Message: ContainerID 402 creation failed : Result: 
CONTAINER_INTERNAL_ERROR


> Fail to create Ratis container
> --
>
> Key: HDDS-2363
> URL: https://issues.apache.org/jira/browse/HDDS-2363
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Error logs;
> 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR  - 
> org.rocksdb.RocksDBException Failed init RocksDB, db path : 
> /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db,
>  exception 
> :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db:
>  does not exist (create_if_missing is false)
> CACHED_OPTS is a RockDB options cache in MetadataStoreBuilder.  The cache 
> keeps the old rocksdb options which is not refreshed with new option values 
> at new call. 
> Logs as following didn't reveal the true failure of write failure.  Will 
> improve following logs too. 
> 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
> CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
> CONTAINER_INTERNAL_ERROR
> 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk 
> : Trace ID:  : Message: ContainerID 402 creation failed : Result: 
> CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2363) Fail to create Ratis container

2019-10-28 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2363:
-
Summary: Fail to create Ratis container  (was: Fail to create Ratis 
pipeline )

> Fail to create Ratis container
> --
>
> Key: HDDS-2363
> URL: https://issues.apache.org/jira/browse/HDDS-2363
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Critical
>
> Error logs;
> 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR  - 
> org.rocksdb.RocksDBException Failed init RocksDB, db path : 
> /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db,
>  exception 
> :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db:
>  does not exist (create_if_missing is false)
> Logs as following didn't reveal the true failure of write failure.  Will 
> improve following logs too. 
> 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
> CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
> CONTAINER_INTERNAL_ERROR
> 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk 
> : Trace ID:  : Message: ContainerID 402 creation failed : Result: 
> CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2363) Fail to create Ratis pipeline

2019-10-28 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2363:
-
Description: 
Error logs;
2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR  - 
org.rocksdb.RocksDBException Failed init RocksDB, db path : 
/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db,
 exception 
:/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db:
 does not exist (create_if_missing is false)

Logs as following didn't reveal the true failure of write failure.  Will 
improve following logs too. 

2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
CONTAINER_INTERNAL_ERROR
2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk : 
Trace ID:  : Message: ContainerID 402 creation failed : Result: 
CONTAINER_INTERNAL_ERROR

  was:
Logs as following didn't reveal the true failure of write failure. 

2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
CONTAINER_INTERNAL_ERROR
2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk : 
Trace ID:  : Message: ContainerID 402 creation failed : Result: 
CONTAINER_INTERNAL_ERROR


> Fail to create Ratis pipeline 
> --
>
> Key: HDDS-2363
> URL: https://issues.apache.org/jira/browse/HDDS-2363
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>
> Error logs;
> 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR  - 
> org.rocksdb.RocksDBException Failed init RocksDB, db path : 
> /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db,
>  exception 
> :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db:
>  does not exist (create_if_missing is false)
> Logs as following didn't reveal the true failure of write failure.  Will 
> improve following logs too. 
> 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
> CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
> CONTAINER_INTERNAL_ERROR
> 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk 
> : Trace ID:  : Message: ContainerID 402 creation failed : Result: 
> CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2363) Fail to create Ratis pipeline

2019-10-28 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2363:
-
Priority: Critical  (was: Major)

> Fail to create Ratis pipeline 
> --
>
> Key: HDDS-2363
> URL: https://issues.apache.org/jira/browse/HDDS-2363
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Critical
>
> Error logs;
> 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR  - 
> org.rocksdb.RocksDBException Failed init RocksDB, db path : 
> /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db,
>  exception 
> :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db:
>  does not exist (create_if_missing is false)
> Logs as following didn't reveal the true failure of write failure.  Will 
> improve following logs too. 
> 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
> CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
> CONTAINER_INTERNAL_ERROR
> 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk 
> : Trace ID:  : Message: ContainerID 402 creation failed : Result: 
> CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2363) Fail to create Ratis pipeline

2019-10-28 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2363:
-
Summary: Fail to create Ratis pipeline   (was: Improve datanode write 
failure log)

> Fail to create Ratis pipeline 
> --
>
> Key: HDDS-2363
> URL: https://issues.apache.org/jira/browse/HDDS-2363
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>
> Logs as following didn't reveal the true failure of write failure. 
> 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
> CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
> CONTAINER_INTERNAL_ERROR
> 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk 
> : Trace ID:  : Message: ContainerID 402 creation failed : Result: 
> CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2363) Fail to create Ratis pipeline

2019-10-28 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2363:
-
Issue Type: Bug  (was: Improvement)

> Fail to create Ratis pipeline 
> --
>
> Key: HDDS-2363
> URL: https://issues.apache.org/jira/browse/HDDS-2363
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>
> Logs as following didn't reveal the true failure of write failure. 
> 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
> CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
> CONTAINER_INTERNAL_ERROR
> 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk 
> : Trace ID:  : Message: ContainerID 402 creation failed : Result: 
> CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2363) Improve datanode write failure log

2019-10-24 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2363:
-
Description: 
Logs as following didn't reveal the true failure of write failure. 

2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
CONTAINER_INTERNAL_ERROR
2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk : 
Trace ID:  : Message: ContainerID 402 creation failed : Result: 
CONTAINER_INTERNAL_ERROR

  was:
Logs as following haven't reveal the true failure of write failure. 

2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
CONTAINER_INTERNAL_ERROR
2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk : 
Trace ID:  : Message: ContainerID 402 creation failed : Result: 
CONTAINER_INTERNAL_ERROR


> Improve datanode write failure log
> --
>
> Key: HDDS-2363
> URL: https://issues.apache.org/jira/browse/HDDS-2363
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>
> Logs as following didn't reveal the true failure of write failure. 
> 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
> CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
> CONTAINER_INTERNAL_ERROR
> 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk 
> : Trace ID:  : Message: ContainerID 402 creation failed : Result: 
> CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2363) Improve datanode write failure log

2019-10-24 Thread Sammi Chen (Jira)
Sammi Chen created HDDS-2363:


 Summary: Improve datanode write failure log
 Key: HDDS-2363
 URL: https://issues.apache.org/jira/browse/HDDS-2363
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Reporter: Sammi Chen
Assignee: Sammi Chen


Logs as following haven't reveal the true failure of write failure. 

2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
CONTAINER_INTERNAL_ERROR
2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk : 
Trace ID:  : Message: ContainerID 402 creation failed : Result: 
CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2307) ContextFactory.java contains Windows '^M" at end of each line

2019-10-23 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen resolved HDDS-2307.
--
Resolution: Not A Problem

> ContextFactory.java contains Windows '^M" at end of each line
> -
>
> Key: HDDS-2307
> URL: https://issues.apache.org/jira/browse/HDDS-2307
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sammi Chen
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: newbie
>
> Covert the file to Unix format. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2307) ContextFactory.java contains Windows '^M" at end of each line

2019-10-23 Thread Sammi Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958494#comment-16958494
 ] 

Sammi Chen commented on HDDS-2307:
--

Hi [~cxorm], thanks for the investigation.  It's accutaully a Hadoop file, not 
a Ozone file. I will close this JIRA and track it on Hadoop side. 

> ContextFactory.java contains Windows '^M" at end of each line
> -
>
> Key: HDDS-2307
> URL: https://issues.apache.org/jira/browse/HDDS-2307
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sammi Chen
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: newbie
>
> Covert the file to Unix format. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2307) ContextFactory.java contains Windows '^M" at end of each line

2019-10-15 Thread Sammi Chen (Jira)
Sammi Chen created HDDS-2307:


 Summary: ContextFactory.java contains Windows '^M" at end of each 
line
 Key: HDDS-2307
 URL: https://issues.apache.org/jira/browse/HDDS-2307
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Sammi Chen


Covert the file to Unix format. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2177) Add a srubber thread to detect creation failure pipelines in ALLOCATED state

2019-09-25 Thread Sammi Chen (Jira)
Sammi Chen created HDDS-2177:


 Summary: Add a srubber thread to detect creation failure pipelines 
in ALLOCATED state
 Key: HDDS-2177
 URL: https://issues.apache.org/jira/browse/HDDS-2177
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Sammi Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2176) Add new pipeline state “CLOSING” and new CLOSE_PIPELINE_STATUS command

2019-09-25 Thread Sammi Chen (Jira)
Sammi Chen created HDDS-2176:


 Summary: Add new pipeline state “CLOSING” and new 
CLOSE_PIPELINE_STATUS command
 Key: HDDS-2176
 URL: https://issues.apache.org/jira/browse/HDDS-2176
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Sammi Chen
Assignee: Sammi Chen


Currrent pipeline has 3 states,  ALLOCATED, OPEN and CLOSED. 
When create pipeline command is sent out to datanodes from SCM, this pipeline 
is marked as ALLOCATED in SCM.  Once SCM received all 3 datanodes confirmation 
of pipelinee creation, SCM will change pipeline's state from ALLOCATED to OPEN. 
 

Close pipeline process is similar.  Add a new CLOSING state to pipeline. When 
close pipeline command is sent out to datanodes, pipeline will be marked as 
CLOSING.  When all 3 datanode confirmed,  change pipeline state from CLOSING to 
CLOSED.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1933) Datanode should use hostname in place of ip addresses to allow DN's to work when ipaddress change

2019-09-18 Thread Sammi Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932361#comment-16932361
 ] 

Sammi Chen commented on HDDS-1933:
--

Hi [~msingh] , offer user the option to choose IpAddress or Hostname as the 
datanode identity is a tradition in HDFS.  We borrow the idea from HDFS so that 
Ozone can be easily adopted in the network environment where previous HDFS is 
deployed.  In many DCs,  static IP is used for datanode.   It's save to use 
IpAddres as the datanode identity in the case.  I would propose to keep this 
option for user.  If the cluster is using hostname because Ipaddress may change 
after restart, like Kubernetes cluster, user can simply set 
"dfs.datanode.use.datanode.hostname"  to true (by default is false).  

> Datanode should use hostname in place of ip addresses to allow DN's to work 
> when ipaddress change
> -
>
> Key: HDDS-1933
> URL: https://issues.apache.org/jira/browse/HDDS-1933
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Priority: Blocker
>
> This was noticed by [~elek] while deploying Ozone on Kubernetes based 
> environment.
> When the datanode ip address change on restart, the Datanode details cease to 
> be correct for the datanode. and this prevents the cluster from functioning 
> after a restart.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2034) Async RATIS pipeline creation and destroy through heartbeat commands

2019-09-18 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2034:
-
Summary: Async RATIS pipeline creation and destroy through heartbeat 
commands  (was: Async pipeline creation and destroy through heartbeat commands)

> Async RATIS pipeline creation and destroy through heartbeat commands
> 
>
> Key: HDDS-2034
> URL: https://issues.apache.org/jira/browse/HDDS-2034
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>
> Currently, pipeline creation and destroy are synchronous operations. SCM 
> directly connect to each datanode of the pipeline through gRPC channel to 
> create the pipeline to destroy the pipeline.  
> This task is to remove the gRPC channel, send pipeline creation and destroy 
> action through heartbeat command to each datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2034) Async pipeline creation and destroy through heartbeat commands

2019-09-18 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2034:
-
Description: 
Currently, pipeline creation and destroy are synchronous operations. SCM 
directly connect to each datanode of the pipeline through gRPC channel to 
create the pipeline to destroy the pipeline.  
This task is to remove the gRPC channel, send pipeline creation and destroy 
action through heartbeat command to each datanode.

> Async pipeline creation and destroy through heartbeat commands
> --
>
> Key: HDDS-2034
> URL: https://issues.apache.org/jira/browse/HDDS-2034
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>
> Currently, pipeline creation and destroy are synchronous operations. SCM 
> directly connect to each datanode of the pipeline through gRPC channel to 
> create the pipeline to destroy the pipeline.  
> This task is to remove the gRPC channel, send pipeline creation and destroy 
> action through heartbeat command to each datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2033) Support join multiple pipelines on datanode

2019-09-18 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen reassigned HDDS-2033:


Assignee: Sammi Chen

> Support join multiple pipelines on datanode
> ---
>
> Key: HDDS-2033
> URL: https://issues.apache.org/jira/browse/HDDS-2033
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2034) Async pipeline creation and destroy through heartbeat commands

2019-09-18 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2034:
-
Summary: Async pipeline creation and destroy through heartbeat commands  
(was: Add create pipeline command dispatcher and handle)

> Async pipeline creation and destroy through heartbeat commands
> --
>
> Key: HDDS-2034
> URL: https://issues.apache.org/jira/browse/HDDS-2034
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2118) Datanode fail to start after stop

2019-09-12 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2118:
-
Description: 
Steps:
1.  Run Teragen and generated a few GB data in a 4 datanodes cluster.  
2.  Stoped the datanodes through ./stop-ozone.sh.
3.  Changed the ozone binaries
4.  Start the cluster through ./start-ozone.sh.
5.  Two datanode regisisterd to SCM. Two datanode fail to appear at SCM side.  
Checked these two failed node, datanode process is still running. In the 
logfile, I found a lot of following errors. 

2019-09-12 21:06:45,255 [Datanode State Machine Thread - 0] INFO   - 
Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
Attempting to start container services.
2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
Background container scanner has been disabled.
2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] ERROR  - Unable 
to communicate to SCM server at 10.120.110.183:9861 for past 2100 seconds.
org.apache.ratis.protocol.ChecksumException: LogEntry is corrupt. Calculated 
checksum is -134141393 but read checksum 0
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:299)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:185)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:121)
at 
org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:94)
at 
org.apache.ratis.server.raftlog.segmented.LogSegment.loadSegment(LogSegment.java:117)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:310)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:234)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:204)
at org.apache.ratis.server.raftlog.RaftLog.open(RaftLog.java:247)
at 
org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:190)
at org.apache.ratis.server.impl.ServerState.(ServerState.java:120)
at 
org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110)
at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


  was:
2019-09-12 21:06:45,255 [Datanode State Machine Thread - 0] INFO   - 
Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
Attempting to start container services.
2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
Background container scanner has been disabled.
2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] ERROR  - Unable 
to communicate to SCM server at 10.120.110.183:9861 for past 2100 seconds.
org.apache.ratis.protocol.ChecksumException: LogEntry is corrupt. Calculated 
checksum is -134141393 but read checksum 0
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:299)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:185)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:121)
at 
org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:94)
at 
org.apache.ratis.server.raftlog.segmented.LogSegment.loadSegment(LogSegment.java:117)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:310)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:234)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:204)
at org.apache.ratis.server.raftlog.RaftLog.open(RaftLog.java:247)
at 
org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:190)

[jira] [Updated] (HDDS-2118) Datanode fail to function after stop

2019-09-12 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2118:
-
Summary: Datanode fail to function after stop  (was: Datanode fail to start 
after stop)

> Datanode fail to function after stop
> 
>
> Key: HDDS-2118
> URL: https://issues.apache.org/jira/browse/HDDS-2118
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sammi Chen
>Priority: Major
>
> Steps:
> 1.  Run Teragen and generated a few GB data in a 4 datanodes cluster.  
> 2.  Stoped the datanodes through ./stop-ozone.sh.
> 3.  Changed the ozone binaries
> 4.  Start the cluster through ./start-ozone.sh.
> 5.  Two datanode regisisterd to SCM. Two datanode fail to appear at SCM side. 
>  
> Checked these two failed node, datanode process is still running. In the 
> logfile, I found a lot of following errors. 
> 2019-09-12 21:06:45,255 [Datanode State Machine Thread - 0] INFO   - 
> Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
> Attempting to start container services.
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
> Background container scanner has been disabled.
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
> Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
> 2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] ERROR  - 
> Unable to communicate to SCM server at 10.120.110.183:9861 for past 2100 
> seconds.
> org.apache.ratis.protocol.ChecksumException: LogEntry is corrupt. Calculated 
> checksum is -134141393 but read checksum 0
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:299)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:185)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:121)
> at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:94)
> at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadSegment(LogSegment.java:117)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:310)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:234)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:204)
> at org.apache.ratis.server.raftlog.RaftLog.open(RaftLog.java:247)
> at 
> org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:190)
> at 
> org.apache.ratis.server.impl.ServerState.(ServerState.java:120)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2118) Datanode fail to start after stop

2019-09-12 Thread Sammi Chen (Jira)
Sammi Chen created HDDS-2118:


 Summary: Datanode fail to start after stop
 Key: HDDS-2118
 URL: https://issues.apache.org/jira/browse/HDDS-2118
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Sammi Chen


2019-09-12 21:06:45,255 [Datanode State Machine Thread - 0] INFO   - 
Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
Attempting to start container services.
2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
Background container scanner has been disabled.
2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] INFO   - 
Starting XceiverServerRatis ba17ad5e-714e-4d82-85d8-ff2e0737fcf9 at port 9858
2019-09-12 21:06:47,255 [Datanode State Machine Thread - 0] ERROR  - Unable 
to communicate to SCM server at 10.120.110.183:9861 for past 2100 seconds.
org.apache.ratis.protocol.ChecksumException: LogEntry is corrupt. Calculated 
checksum is -134141393 but read checksum 0
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:299)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:185)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:121)
at 
org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:94)
at 
org.apache.ratis.server.raftlog.segmented.LogSegment.loadSegment(LogSegment.java:117)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:310)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:234)
at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:204)
at org.apache.ratis.server.raftlog.RaftLog.open(RaftLog.java:247)
at 
org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:190)
at org.apache.ratis.server.impl.ServerState.(ServerState.java:120)
at 
org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:110)
at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone

2019-09-12 Thread Sammi Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928315#comment-16928315
 ] 

Sammi Chen commented on HDDS-2106:
--

Hi [~elek] , I meet build issue after trunk rebase.  Following is the console 
log.  I use the build command
"mvn clean install -T 6 -Pdist -Phdds -DskipTests -Dmaven.javadoc.skip=true -am 
-pl :hadoop-ozone-dist"

I see maven-javadoc-plugin.version is defined as "3.0.1".   My local maven is 
3.6.0.  Don't know why build fails. 

[INFO] Scanning for projects...
[ERROR] [ERROR] Some problems were encountered while processing the POMs:
[ERROR] 'build.plugins.plugin.version' for 
org.apache.maven.plugins:maven-javadoc-plugin must be a valid version but is 
'${maven-javadoc-plugin.version}'. @ 
org.apache.hadoop:hadoop-main-ozone:0.5.0-SNAPSHOT, 
/Users/sammi/workspace/hadoop/pom.ozone.xml, line 1604, column 20
[ERROR] 'build.plugins.plugin.version' for 
org.apache.maven.plugins:maven-javadoc-plugin must be a valid version but is 
'${maven-javadoc-plugin.version}'. @ 
org.apache.hadoop:hadoop-main-ozone:0.5.0-SNAPSHOT, 
/Users/sammi/workspace/hadoop/pom.ozone.xml, line 1604, column 20


> Avoid usage of hadoop projects as parent of hdds/ozone
> --
>
> Key: HDDS-2106
> URL: https://issues.apache.org/jira/browse/HDDS-2106
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone uses hadoop as a dependency. The dependency defined on multiple level:
>  1. the hadoop artifacts are defined in the  sections
>  2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the 
> parent
> As we already have a slightly different assembly process it could be more 
> resilient to use a dedicated parent project instead of the hadoop one. With 
> this approach it will be easier to upgrade the versions as we don't need to 
> be careful about the pom contents only about the used dependencies.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2034) Add create pipeline command dispatcher and handle

2019-09-10 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen reassigned HDDS-2034:


Assignee: Sammi Chen

> Add create pipeline command dispatcher and handle
> -
>
> Key: HDDS-2034
> URL: https://issues.apache.org/jira/browse/HDDS-2034
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2069) Default values of property hdds.datanode.storage.utilization.critical.threshold and hdds.datanode.storage.utilization.warning.threshold are not reasonable

2019-09-02 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2069:
-
Summary: Default values of property 
hdds.datanode.storage.utilization.critical.threshold and 
hdds.datanode.storage.utilization.warning.threshold are not reasonable  (was: 
Value of property hdds.datanode.storage.utilization.critical.threshold and 
hdds.datanode.storage.utilization.warning.threshold is not reasonable)

> Default values of property 
> hdds.datanode.storage.utilization.critical.threshold and 
> hdds.datanode.storage.utilization.warning.threshold are not reasonable
> --
>
> Key: HDDS-2069
> URL: https://issues.apache.org/jira/browse/HDDS-2069
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>
> Currently, hdds.datanode.storage.utilization.warning.threshold is 0.95 and 
> hdds.datanode.storage.utilization.critical.threshold is 0.75.
> The values should be exchanged. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2069) Value of property hdds.datanode.storage.utilization.critical.threshold and hdds.datanode.storage.utilization.warning.threshold is not reasonable

2019-09-02 Thread Sammi Chen (Jira)
Sammi Chen created HDDS-2069:


 Summary: Value of property 
hdds.datanode.storage.utilization.critical.threshold and 
hdds.datanode.storage.utilization.warning.threshold is not reasonable
 Key: HDDS-2069
 URL: https://issues.apache.org/jira/browse/HDDS-2069
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Sammi Chen
Assignee: Sammi Chen


Currently, hdds.datanode.storage.utilization.warning.threshold is 0.95 and 
hdds.datanode.storage.utilization.critical.threshold is 0.75.
The values should be exchanged. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1553) Add metrics in rack aware container placement policy

2019-08-27 Thread Sammi Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917357#comment-16917357
 ] 

Sammi Chen commented on HDDS-1553:
--

Hi [~ljain], I just attched the initial patch.  Feel free to give any 
feedbacks. 

> Add metrics in rack aware container placement policy
> 
>
> Key: HDDS-1553
> URL: https://issues.apache.org/jira/browse/HDDS-1553
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> To collect following statistics, 
> 1. total requested datanode count (A)
> 2. success allocated datanode count without constrain compromise (B)
> 3. success allocated datanode count with some comstrain compromise (C)
> B includes C, failed allocation = (A - B)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1571) Create an interface for pipeline placement policy to support network topologies

2019-08-26 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen reassigned HDDS-1571:


Assignee: Sammi Chen  (was: Siddharth Wagle)

> Create an interface for pipeline placement policy to support network 
> topologies
> ---
>
> Key: HDDS-1571
> URL: https://issues.apache.org/jira/browse/HDDS-1571
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Siddharth Wagle
>Assignee: Sammi Chen
>Priority: Major
>
> Leverage the work done in HDDS-700 for pipeline creation for open containers.
> Create an interface that can provide different policy implementations for 
> pipeline creation. The default implementation should take into account no 
> topology information is configured.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1569) Add ability to SCM for creating multiple pipelines with same datanode

2019-08-26 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen reassigned HDDS-1569:


Assignee: Li Cheng  (was: Siddharth Wagle)

> Add ability to SCM for creating multiple pipelines with same datanode
> -
>
> Key: HDDS-1569
> URL: https://issues.apache.org/jira/browse/HDDS-1569
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Siddharth Wagle
>Assignee: Li Cheng
>Priority: Major
>
> - Refactor _RatisPipelineProvider.create()_ to be able to create pipelines 
> with datanodes that are not a part of sufficient pipelines
> - Define soft and hard upper bounds for pipeline membership
> - Create SCMAllocationManager that can be leveraged to get a candidate set of 
> datanodes based on placement policies
> - Add the datanodes to internal datastructures



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2036) Multi-raft support on single datanode integration test

2019-08-26 Thread Sammi Chen (Jira)
Sammi Chen created HDDS-2036:


 Summary: Multi-raft support on single datanode integration test
 Key: HDDS-2036
 URL: https://issues.apache.org/jira/browse/HDDS-2036
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Sammi Chen


Improve MiniOzoneCluster to support multi-raft group



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2035) Improve CLI listPipeline

2019-08-26 Thread Sammi Chen (Jira)
Sammi Chen created HDDS-2035:


 Summary: Improve CLI listPipeline
 Key: HDDS-2035
 URL: https://issues.apache.org/jira/browse/HDDS-2035
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Sammi Chen


1. filter pipeline by datanode




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1574) ensure same datanodes are not a part of multiple pipelines

2019-08-26 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1574:
-
Summary: ensure same datanodes are not a part of multiple pipelines  (was: 
Implement pipeline placement policy to ensure same datanodes are not a part of 
multiple pipelines)

> ensure same datanodes are not a part of multiple pipelines
> --
>
> Key: HDDS-1574
> URL: https://issues.apache.org/jira/browse/HDDS-1574
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Siddharth Wagle
>Assignee: Siddharth Wagle
>Priority: Major
>
> Details in design doc.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2034) Add create pipeline command dispatcher and handle

2019-08-26 Thread Sammi Chen (Jira)
Sammi Chen created HDDS-2034:


 Summary: Add create pipeline command dispatcher and handle
 Key: HDDS-2034
 URL: https://issues.apache.org/jira/browse/HDDS-2034
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Sammi Chen






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1574) Implement pipeline placement policy to ensure same datanodes are not a part of multiple pipelines

2019-08-26 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1574:
-
Summary: Implement pipeline placement policy to ensure same datanodes are 
not a part of multiple pipelines  (was: Implement pipeline choose policy to 
ensure same datanodes are not a part of multiple pipelines)

> Implement pipeline placement policy to ensure same datanodes are not a part 
> of multiple pipelines
> -
>
> Key: HDDS-1574
> URL: https://issues.apache.org/jira/browse/HDDS-1574
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Siddharth Wagle
>Assignee: Siddharth Wagle
>Priority: Major
>
> Details in design doc.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1574) Implement pipeline choose policy to ensure same datanodes are not a part of multiple pipelines

2019-08-26 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1574:
-
Summary: Implement pipeline choose policy to ensure same datanodes are not 
a part of multiple pipelines  (was: Ensure that same datanodes are not a part 
of multiple pipelines)

> Implement pipeline choose policy to ensure same datanodes are not a part of 
> multiple pipelines
> --
>
> Key: HDDS-1574
> URL: https://issues.apache.org/jira/browse/HDDS-1574
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Siddharth Wagle
>Assignee: Siddharth Wagle
>Priority: Major
>
> Details in design doc.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2033) Support join multiple pipelines on datanode

2019-08-26 Thread Sammi Chen (Jira)
Sammi Chen created HDDS-2033:


 Summary: Support join multiple pipelines on datanode
 Key: HDDS-2033
 URL: https://issues.apache.org/jira/browse/HDDS-2033
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Sammi Chen






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1930) Test Topology Aware Job scheduling with Ozone Topology

2019-08-25 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1930:
-
Description: 
My initial results with Terasort does not seem to report the counter properly. 
Most of the requests are handled by rack local but no node local. This ticket 
is opened to add more system testing to validate the feature. 

Total Allocated Containers: 3778
Each table cell represents the number of NodeLocal/RackLocal/OffSwitch 
containers satisfied by NodeLocal/RackLocal/OffSwitch resource requests.
Node Local Request  Rack Local Request  Off Switch Request
Num Node Local Containers (satisfied by)0   
Num Rack Local Containers (satisfied by)0   3648
Num Off Switch Containers (satisfied by)0   96  34

  was:
My initial results with Terasort does not seem to report the counter properly. 
Most of the requests are handled by rack locl but no node local. This ticket is 
opened to add more system testing to validate the feature. 

Total Allocated Containers: 3778
Each table cell represents the number of NodeLocal/RackLocal/OffSwitch 
containers satisfied by NodeLocal/RackLocal/OffSwitch resource requests.
Node Local Request  Rack Local Request  Off Switch Request
Num Node Local Containers (satisfied by)0   
Num Rack Local Containers (satisfied by)0   3648
Num Off Switch Containers (satisfied by)0   96  34


> Test Topology Aware Job scheduling with Ozone Topology
> --
>
> Key: HDDS-1930
> URL: https://issues.apache.org/jira/browse/HDDS-1930
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Priority: Major
>
> My initial results with Terasort does not seem to report the counter 
> properly. Most of the requests are handled by rack local but no node local. 
> This ticket is opened to add more system testing to validate the feature. 
> Total Allocated Containers: 3778
> Each table cell represents the number of NodeLocal/RackLocal/OffSwitch 
> containers satisfied by NodeLocal/RackLocal/OffSwitch resource requests.
> Node Local RequestRack Local Request  Off Switch Request
> Num Node Local Containers (satisfied by)  0   
> Num Rack Local Containers (satisfied by)  0   3648
> Num Off Switch Containers (satisfied by)  0   96  34



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2031) Choose datanode for pipeline creation based on network topology

2019-08-25 Thread Sammi Chen (Jira)
Sammi Chen created HDDS-2031:


 Summary: Choose datanode for pipeline creation based on network 
topology
 Key: HDDS-2031
 URL: https://issues.apache.org/jira/browse/HDDS-2031
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Sammi Chen
Assignee: Sammi Chen


There are regular heartbeats between datanodes in a pipeline. Choose datanodes 
based on network topology, to guarantee data reliability and reduce heartbeat 
network traffic latency.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1963) OM DB Schema defintion in OmMetadataManagerImpl and OzoneConsts are not consistent

2019-08-14 Thread Sammi Chen (JIRA)
Sammi Chen created HDDS-1963:


 Summary: OM DB Schema defintion in OmMetadataManagerImpl and 
OzoneConsts are not consistent
 Key: HDDS-1963
 URL: https://issues.apache.org/jira/browse/HDDS-1963
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Sammi Chen


OzoneConsts.java

 * OM DB Schema:
   *  --
   *  |  KEY | VALUE   |
   *  --
   *  | $userName|  VolumeList |
   *  --
   *  | /#volumeName |  VolumeInfo |
   *  --
   *  | /#volumeName/#bucketName |  BucketInfo |
   *  --
   *  | /volumeName/bucketName/keyName   |  KeyInfo|
   *  --
   *  | #deleting#/volumeName/bucketName/keyName |  KeyInfo|
   *  --

OmMetadataManagerImpl.java

/**
   * OM RocksDB Structure .
   * 
   * OM DB stores metadata as KV pairs in different column families.
   * 
   * OM DB Schema:
   * |---|
   * |  Column Family |VALUE |
   * |---|
   * | userTable  | user->VolumeList |
   * |---|
   * | volumeTable| /volume->VolumeInfo  |
   * |---|
   * | bucketTable| /volume/bucket-> BucketInfo  |
   * |---|
   * | keyTable   | /volumeName/bucketName/keyName->KeyInfo  |
   * |---|
   * | deletedTable   | /volumeName/bucketName/keyName->KeyInfo  |
   * |---|
   * | openKey| /volumeName/bucketName/keyName/id->KeyInfo   |
   * |---|
   * | s3Table| s3BucketName -> /volumeName/bucketName   |
   * |---|
   * | s3SecretTable  | s3g_access_key_id -> s3Secret|
   * |---|
   * | dTokenTable| s3g_access_key_id -> s3Secret|
   * |---|
   * | prefixInfoTable | prefix -> PrefixInfo   |
   * |---|
   */

It's better to put OM DB Schema defintion in one place to resolve this 
inconsistency due to information redundancy. 





--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1897) SCMNodeManager.java#getNodeByAddress cannot find nodes by addresses

2019-08-13 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen reassigned HDDS-1897:


Assignee: Li Cheng

> SCMNodeManager.java#getNodeByAddress cannot find nodes by addresses
> ---
>
> Key: HDDS-1897
> URL: https://issues.apache.org/jira/browse/HDDS-1897
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Mukul Kumar Singh
>Assignee: Li Cheng
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> SCMNodeManager cannot find the nodes via ip addresses in MiniOzoneChaosCluster
> {code}
> 2019-08-02 13:57:01,501 WARN  node.SCMNodeManager 
> (SCMNodeManager.java:getNodeByAddress(599)) - Cannot find node for address 
> 127.0.0.1
> {code}
> cc: [~xyao] & [~Sammi]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1958) Container time information returned by "scmcli list" is not human friendly

2019-08-13 Thread Sammi Chen (JIRA)
Sammi Chen created HDDS-1958:


 Summary: Container time information returned by "scmcli list" is 
not human friendly
 Key: HDDS-1958
 URL: https://issues.apache.org/jira/browse/HDDS-1958
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Sammi Chen


ozone scmcli list -s=0
{
  "state" : "OPEN",
  "replicationFactor" : "ONE",
  "replicationType" : "STAND_ALONE",
  "usedBytes" : 0,
  "numberOfKeys" : 0,
  "lastUsed" : 13353985,
  "stateEnterTime" : 13316615,
  "owner" : "OZONE",
  "containerID" : 1,
  "deleteTransactionId" : 0,
  "sequenceId" : 0,
  "open" : true
}

LastUsed and stateEnterTime are not human friendly.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1953) Remove pipeline persistent in SCM

2019-08-11 Thread Sammi Chen (JIRA)
Sammi Chen created HDDS-1953:


 Summary: Remove pipeline persistent in SCM
 Key: HDDS-1953
 URL: https://issues.apache.org/jira/browse/HDDS-1953
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Sammi Chen
Assignee: Sammi Chen


Currently, SCM will persistent pipeline in metastore with datanode information 
locally. After SCM restart, it will reload all the pipelines from the 
metastore.  If there is any datanode information change during the whole SCMc 
lifecycle, the persisted pipeline is not updated. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1882) TestReplicationManager failed with NPE in ReplicationManager.java

2019-08-01 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1882:
-
Summary: TestReplicationManager failed with NPE in ReplicationManager.java  
 (was: TestReplicationManager failed with NPE)

> TestReplicationManager failed with NPE in ReplicationManager.java 
> --
>
> Key: HDDS-1882
> URL: https://issues.apache.org/jira/browse/HDDS-1882
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1882) TestReplicationManager failed with NPE

2019-07-31 Thread Sammi Chen (JIRA)
Sammi Chen created HDDS-1882:


 Summary: TestReplicationManager failed with NPE
 Key: HDDS-1882
 URL: https://issues.apache.org/jira/browse/HDDS-1882
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Sammi Chen
Assignee: Sammi Chen






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1879) Support multiple excluded scopes when choosing datanodes in NetworkTopology

2019-07-31 Thread Sammi Chen (JIRA)
Sammi Chen created HDDS-1879:


 Summary: Support multiple excluded scopes when choosing datanodes 
in NetworkTopology
 Key: HDDS-1879
 URL: https://issues.apache.org/jira/browse/HDDS-1879
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Sammi Chen
Assignee: Sammi Chen






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1865) Use "ozone.network.topology.aware.read" to control both RPC client and server side logic

2019-07-29 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1865:
-
Summary: Use "ozone.network.topology.aware.read" to control both RPC client 
and server side logic   (was: Use "dfs.network.topology.aware.read.enable" to 
control both RPC client and server side logic )

> Use "ozone.network.topology.aware.read" to control both RPC client and server 
> side logic 
> -
>
> Key: HDDS-1865
> URL: https://issues.apache.org/jira/browse/HDDS-1865
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1865) Use "dfs.network.topology.aware.read.enable" to control both clien…

2019-07-29 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1865:
-
Summary: Use "dfs.network.topology.aware.read.enable" to control both 
clien…  (was: Use "ozone.distance.aware.read.enable" to control both client and 
OM side distance aware read logic)

> Use "dfs.network.topology.aware.read.enable" to control both clien…
> ---
>
> Key: HDDS-1865
> URL: https://issues.apache.org/jira/browse/HDDS-1865
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1865) Use "dfs.network.topology.aware.read.enable" to control both RPC client and server side logic

2019-07-29 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1865:
-
Summary: Use "dfs.network.topology.aware.read.enable" to control both RPC 
client and server side logic   (was: Use 
"dfs.network.topology.aware.read.enable" to control both clien…)

> Use "dfs.network.topology.aware.read.enable" to control both RPC client and 
> server side logic 
> --
>
> Key: HDDS-1865
> URL: https://issues.apache.org/jira/browse/HDDS-1865
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1865) Use "ozone.distance.aware.read.enable" to control both client and OM side distance aware read logic

2019-07-26 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1865:
-
Summary: Use "ozone.distance.aware.read.enable" to control both client and 
OM side distance aware read logic  (was: Use "ozone.distance.aware.read.enable" 
to control both client side and OM side topology aware read logic)

> Use "ozone.distance.aware.read.enable" to control both client and OM side 
> distance aware read logic
> ---
>
> Key: HDDS-1865
> URL: https://issues.apache.org/jira/browse/HDDS-1865
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1865) Use "ozone.distance.aware.read.enable" to control both client side and OM side topology aware read logic

2019-07-26 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1865:
-
Summary: Use "ozone.distance.aware.read.enable" to control both client side 
and OM side topology aware read logic  (was: Use 
"dfs.network.topology.aware.read.enable" to control both client side and OM 
side topology aware read logic)

> Use "ozone.distance.aware.read.enable" to control both client side and OM 
> side topology aware read logic
> 
>
> Key: HDDS-1865
> URL: https://issues.apache.org/jira/browse/HDDS-1865
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1865) Use "dfs.network.topology.aware.read.enable" to control both client side and OM side topology aware read logic

2019-07-25 Thread Sammi Chen (JIRA)
Sammi Chen created HDDS-1865:


 Summary: Use "dfs.network.topology.aware.read.enable" to control 
both client side and OM side topology aware read logic
 Key: HDDS-1865
 URL: https://issues.apache.org/jira/browse/HDDS-1865
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Sammi Chen
Assignee: Sammi Chen






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1864) Turn on topology aware read in TestFailureHandlingByClient

2019-07-25 Thread Sammi Chen (JIRA)
Sammi Chen created HDDS-1864:


 Summary: Turn on topology aware read in TestFailureHandlingByClient
 Key: HDDS-1864
 URL: https://issues.apache.org/jira/browse/HDDS-1864
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Sammi Chen
Assignee: Sammi Chen






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1707) SCMContainerPlacementRackAware#chooseDatanodes throws not enough datanodes when all nodes(40) are up

2019-07-25 Thread Sammi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893282#comment-16893282
 ] 

Sammi Chen commented on HDDS-1707:
--

Thanks [~msingh] for report his. It been fixed by code change in HDDS-1713. 

> SCMContainerPlacementRackAware#chooseDatanodes throws not enough datanodes 
> when all nodes(40) are up
> 
>
> Key: HDDS-1707
> URL: https://issues.apache.org/jira/browse/HDDS-1707
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Mukul Kumar Singh
>Priority: Major
>
> SCMContainerPlacementRackAware#chooseDatanodes is failing with the following 
> error repeatedly.
> {code}
> 2019-06-17 22:15:52,455 WARN 
> org.apache.hadoop.hdds.scm.container.ReplicationManager: Exception while 
> replicating container 407.
> org.apache.hadoop.hdds.scm.exceptions.SCMException: No enough datanodes to 
> choose.
> at 
> org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:100)
> at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:487)
> at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:293)
> at 
> java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649)
> at 
> java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
> at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:205)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1707) SCMContainerPlacementRackAware#chooseDatanodes throws not enough datanodes when all nodes(40) are up

2019-07-25 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen resolved HDDS-1707.
--
Resolution: Fixed
  Assignee: Sammi Chen

> SCMContainerPlacementRackAware#chooseDatanodes throws not enough datanodes 
> when all nodes(40) are up
> 
>
> Key: HDDS-1707
> URL: https://issues.apache.org/jira/browse/HDDS-1707
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Mukul Kumar Singh
>Assignee: Sammi Chen
>Priority: Major
>
> SCMContainerPlacementRackAware#chooseDatanodes is failing with the following 
> error repeatedly.
> {code}
> 2019-06-17 22:15:52,455 WARN 
> org.apache.hadoop.hdds.scm.container.ReplicationManager: Exception while 
> replicating container 407.
> org.apache.hadoop.hdds.scm.exceptions.SCMException: No enough datanodes to 
> choose.
> at 
> org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:100)
> at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:487)
> at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:293)
> at 
> java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649)
> at 
> java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
> at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:205)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1809) Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis pipeline

2019-07-25 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen resolved HDDS-1809.
--
Resolution: Fixed

> Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis 
> pipeline
> -
>
> Key: HDDS-1809
> URL: https://issues.apache.org/jira/browse/HDDS-1809
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Sammi Chen
>Priority: Major
> Fix For: 0.5.0
>
>
> {code:java}
> java.io.IOException: Unexpected OzoneException: java.io.IOException: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342)
> at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307)
> at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259)
> at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239)
> at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
> at 
> org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47)
> at java.io.InputStream.read(InputStream.java:101)
> at 
> org.apache.hadoop.ozone.container.ContainerTestHelper.validateData(ContainerTestHelper.java:709)
> at 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.validateData(TestFailureHandlingByClient.java:458)
> at 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testBlockWritesWithDnFailures(TestFailureHandlingByClient.java:158)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
> at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1855) TestStorageContainerManager#testScmProcessDatanodeHeartbeat is failing

2019-07-25 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1855:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> TestStorageContainerManager#testScmProcessDatanodeHeartbeat is failing
> --
>
> Key: HDDS-1855
> URL: https://issues.apache.org/jira/browse/HDDS-1855
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {{TestStorageContainerManager#testScmProcessDatanodeHeartbeat}} is failing 
> with the following exception
> {noformat}
> [ERROR] Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 106.315 s <<< FAILURE! - in 
> org.apache.hadoop.ozone.TestStorageContainerManager
> [ERROR] 
> testScmProcessDatanodeHeartbeat(org.apache.hadoop.ozone.TestStorageContainerManager)
>   Time elapsed: 21.97 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.ozone.TestStorageContainerManager.testScmProcessDatanodeHeartbeat(TestStorageContainerManager.java:531)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1751) replication of underReplicated container fails with SCMContainerPlacementRackAware policy

2019-07-23 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen resolved HDDS-1751.
--
Resolution: Fixed

> replication of underReplicated container fails with 
> SCMContainerPlacementRackAware policy
> -
>
> Key: HDDS-1751
> URL: https://issues.apache.org/jira/browse/HDDS-1751
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Sammi Chen
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> SCM container replication fails with
> {code}
> 2019-07-02 18:26:41,564 WARN  container.ReplicationManager 
> (ReplicationManager.java:handleUnderReplicatedContainer(501)) - Exception 
> while replicating container 18.
> org.apache.hadoop.hdds.scm.exceptions.SCMException: No enough datanodes to 
> choose.
> at 
> org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:100)
> at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:487)
> at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:293)
> at 
> java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649)
> at 
> java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
> at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:205)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1751) replication of underReplicated container fails with SCMContainerPlacementRackAware policy

2019-07-23 Thread Sammi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890814#comment-16890814
 ] 

Sammi Chen commented on HDDS-1751:
--

Yes, it's fixed by HDDS-1713.  I run "src/test/bin/start-chaos.sh" locally with 
SCMContainerPlacementRackAware as the placement policy.  Here is the log, 

2019-07-23 16:57:38,336 INFO  container.ReplicationManager 
(ReplicationManager.java:handleUnderReplicatedContainer(489)) - Container #3 is 
under replicated. Expected replica count is 3, but found 2.
2019-07-23 16:57:38,336 INFO  container.ReplicationManager 
(ReplicationManager.java:sendReplicateCommand(652)) - Sending replicate 
container command for container #3 to datanode 
e4635174-5f4b-4141-aea3-d994486370aa{ip: 127.0.0.1, host: vm-centos, 
networkLocation: /default-rack, certSerialId: null}
2019-07-23 16:57:38,336 INFO  container.ReplicationManager 
(ReplicationManager.java:handleUnderReplicatedContainer(489)) - Container #9 is 
under replicated. Expected replica count is 3, but found 2.
2019-07-23 16:57:38,336 INFO  container.ReplicationManager 
(ReplicationManager.java:sendReplicateCommand(652)) - Sending replicate 
container command for container #9 to datanode 
718da402-1433-4b44-8479-0a42e47929fd{ip: 127.0.0.1, host: vm-centos, 
networkLocation: /default-rack, certSerialId: null}
2019-07-23 16:57:38,336 INFO  container.ReplicationManager 
(ReplicationManager.java:handleUnderReplicatedContainer(489)) - Container #10 
is under replicated. Expected replica count is 3, but found 2.
2019-07-23 16:57:38,336 INFO  container.ReplicationManager 
(ReplicationManager.java:sendReplicateCommand(652)) - Sending replicate 
container command for container #10 to datanode 
718da402-1433-4b44-8479-0a42e47929fd{ip: 127.0.0.1, host: vm-centos, 
networkLocation: /default-rack, certSerialId: null}



> replication of underReplicated container fails with 
> SCMContainerPlacementRackAware policy
> -
>
> Key: HDDS-1751
> URL: https://issues.apache.org/jira/browse/HDDS-1751
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Sammi Chen
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> SCM container replication fails with
> {code}
> 2019-07-02 18:26:41,564 WARN  container.ReplicationManager 
> (ReplicationManager.java:handleUnderReplicatedContainer(501)) - Exception 
> while replicating container 18.
> org.apache.hadoop.hdds.scm.exceptions.SCMException: No enough datanodes to 
> choose.
> at 
> org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:100)
> at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:487)
> at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:293)
> at 
> java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649)
> at 
> java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
> at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:205)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1751) replication of underReplicated container fails with SCMContainerPlacementRackAware policy

2019-07-23 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen reassigned HDDS-1751:


Assignee: Sammi Chen

> replication of underReplicated container fails with 
> SCMContainerPlacementRackAware policy
> -
>
> Key: HDDS-1751
> URL: https://issues.apache.org/jira/browse/HDDS-1751
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Sammi Chen
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> SCM container replication fails with
> {code}
> 2019-07-02 18:26:41,564 WARN  container.ReplicationManager 
> (ReplicationManager.java:handleUnderReplicatedContainer(501)) - Exception 
> while replicating container 18.
> org.apache.hadoop.hdds.scm.exceptions.SCMException: No enough datanodes to 
> choose.
> at 
> org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:100)
> at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:487)
> at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:293)
> at 
> java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649)
> at 
> java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
> at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:205)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1809) Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis pipeline

2019-07-23 Thread Sammi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890729#comment-16890729
 ] 

Sammi Chen commented on HDDS-1809:
--

Thanks [~shashikant] for report this issue.  It has been fixed by the code 
change in HDDS-1713.
The root cause is previously network topology use Ipaddress as the node key in 
topology cluster, which results that three sorted Datanodes are the same node.
Now datanode uuid is used as the node key in topology cluster, so sorted 
Datanodes will be three different nodes now.

> Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis 
> pipeline
> -
>
> Key: HDDS-1809
> URL: https://issues.apache.org/jira/browse/HDDS-1809
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Sammi Chen
>Priority: Major
> Fix For: 0.5.0
>
>
> {code:java}
> java.io.IOException: Unexpected OzoneException: java.io.IOException: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342)
> at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307)
> at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259)
> at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239)
> at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
> at 
> org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47)
> at java.io.InputStream.read(InputStream.java:101)
> at 
> org.apache.hadoop.ozone.container.ContainerTestHelper.validateData(ContainerTestHelper.java:709)
> at 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.validateData(TestFailureHandlingByClient.java:458)
> at 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testBlockWritesWithDnFailures(TestFailureHandlingByClient.java:158)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
> at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1653) Add option to "ozone scmcli printTopology" to order the output acccording to topology layer

2019-07-18 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1653:
-
   Resolution: Fixed
Fix Version/s: 0.5.0
   Status: Resolved  (was: Patch Available)

> Add option to "ozone scmcli printTopology" to order the output acccording to 
> topology layer
> ---
>
> Key: HDDS-1653
> URL: https://issues.apache.org/jira/browse/HDDS-1653
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Add option to order the output acccording to topology layer.
> For example, for /rack/node topolgy, we can show,
> State = HEALTHY
> /default-rack:
> ozone_datanode_1.ozone_default/172.18.0.3
> ozone_datanode_2.ozone_default/172.18.0.2
> ozone_datanode_3.ozone_default/172.18.0.4
> /rack1:
> ozone_datanode_4.ozone_default/172.18.0.5
> ozone_datanode_5.ozone_default/172.18.0.6
> For /dc/rack/node topology, we can either show
> State = HEALTHY
> /default-dc/default-rack:
> ozone_datanode_1.ozone_default/172.18.0.3
> ozone_datanode_2.ozone_default/172.18.0.2
> ozone_datanode_3.ozone_default/172.18.0.4
> /dc1/rack1:
> ozone_datanode_4.ozone_default/172.18.0.5
> ozone_datanode_5.ozone_default/172.18.0.6
> or
> State = HEALTHY
> default-dc:
> default-rack:
> ozone_datanode_1.ozone_default/172.18.0.3
> ozone_datanode_2.ozone_default/172.18.0.2
> ozone_datanode_3.ozone_default/172.18.0.4
> dc1:
> rack1:
> ozone_datanode_4.ozone_default/172.18.0.5
> ozone_datanode_5.ozone_default/172.18.0.6



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1787) NPE thrown while trying to find DN closest to client

2019-07-15 Thread Sammi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885781#comment-16885781
 ] 

Sammi Chen commented on HDDS-1787:
--

Hi [~msingh], thanks for the instructions.  I will try it locally.  I also 
created a unit test which reproduced the issue. 

> NPE thrown while trying to find DN closest to client
> 
>
> Key: HDDS-1787
> URL: https://issues.apache.org/jira/browse/HDDS-1787
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Siddharth Wagle
>Assignee: Sammi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> cc: [~xyao] This seems related to the client side topology changes, not sure 
> if some other Jira is already addressing this.
> {code}
> 2019-07-10 16:45:53,176 WARN  ipc.Server (Server.java:logException(2724)) - 
> IPC Server handler 14 on 35066, call Call#127037 Retry#0 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol.send from 17
> 2.31.116.73:52540
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.lambda$sortDatanodes$0(ScmBlockLocationProtocolServerSideTranslatorPB.java:215)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.sortDatanodes(ScmBlockLocationProtocolServerSideTranslatorPB.java:215)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:124)
> at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> 2019-07-10 16:45:53,176 WARN  om.KeyManagerImpl 
> (KeyManagerImpl.java:lambda$sortDatanodeInPipeline$7(2129)) - Unable to sort 
> datanodes based on distance to client, volume=xqoyzocpse, bucket=vxwajaczqh, 
> key=pool-444-thread-7-201077822, client=127.0.0.1, 
> datanodes=[10f15723-45d7-4a0c-8f01-8b101744a110{ip: 172.31.116.73, host: 
> sid-minichaos.gce.cloudera.com, networkLocation: /default-rack, certSerialId: 
> null}, 7ac2777f-0a5c-4414-9e7f-bfbc47d696ea{ip: 172.31.116.73, host: 
> sid-minichaos.gce.cloudera.com, networkLocation: /default-rack, certSerialId: 
> null}], exception=java.lang.NullPointerException
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.lambda$sortDatanodes$0(ScmBlockLocationProtocolServerSideTranslatorPB.java:215)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.sortDatanodes(ScmBlockLocationProtocolServerSideTranslatorPB.java:215)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:124)
> at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {code}



--
This message 

[jira] [Commented] (HDDS-1787) NPE thrown while trying to find DN closest to client

2019-07-15 Thread Sammi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885202#comment-16885202
 ] 

Sammi Chen commented on HDDS-1787:
--

Hi [~swagle], I uploaded a patch. Could you help me to review it? 

> NPE thrown while trying to find DN closest to client
> 
>
> Key: HDDS-1787
> URL: https://issues.apache.org/jira/browse/HDDS-1787
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Siddharth Wagle
>Assignee: Sammi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> cc: [~xyao] This seems related to the client side topology changes, not sure 
> if some other Jira is already addressing this.
> {code}
> 2019-07-10 16:45:53,176 WARN  ipc.Server (Server.java:logException(2724)) - 
> IPC Server handler 14 on 35066, call Call#127037 Retry#0 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol.send from 17
> 2.31.116.73:52540
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.lambda$sortDatanodes$0(ScmBlockLocationProtocolServerSideTranslatorPB.java:215)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.sortDatanodes(ScmBlockLocationProtocolServerSideTranslatorPB.java:215)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:124)
> at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> 2019-07-10 16:45:53,176 WARN  om.KeyManagerImpl 
> (KeyManagerImpl.java:lambda$sortDatanodeInPipeline$7(2129)) - Unable to sort 
> datanodes based on distance to client, volume=xqoyzocpse, bucket=vxwajaczqh, 
> key=pool-444-thread-7-201077822, client=127.0.0.1, 
> datanodes=[10f15723-45d7-4a0c-8f01-8b101744a110{ip: 172.31.116.73, host: 
> sid-minichaos.gce.cloudera.com, networkLocation: /default-rack, certSerialId: 
> null}, 7ac2777f-0a5c-4414-9e7f-bfbc47d696ea{ip: 172.31.116.73, host: 
> sid-minichaos.gce.cloudera.com, networkLocation: /default-rack, certSerialId: 
> null}], exception=java.lang.NullPointerException
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.lambda$sortDatanodes$0(ScmBlockLocationProtocolServerSideTranslatorPB.java:215)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.sortDatanodes(ScmBlockLocationProtocolServerSideTranslatorPB.java:215)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:124)
> at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (HDDS-1787) NPE thrown while trying to find DN closest to client

2019-07-15 Thread Sammi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884825#comment-16884825
 ] 

Sammi Chen edited comment on HDDS-1787 at 7/15/19 9:01 AM:
---

Hi [~swagle], I would like to know how to run the MiniOzoneChaos cluster to 
verify the issue is fixed.  TestMiniChaosOzoneCluster cannot reproduce the 
issue.


was (Author: sammi):
Hi [~swagle], I would like to know how to run the MiniOzoneChaos cluster to 
verify the issue is fixed. 

> NPE thrown while trying to find DN closest to client
> 
>
> Key: HDDS-1787
> URL: https://issues.apache.org/jira/browse/HDDS-1787
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Siddharth Wagle
>Assignee: Sammi Chen
>Priority: Major
>
> cc: [~xyao] This seems related to the client side topology changes, not sure 
> if some other Jira is already addressing this.
> {code}
> 2019-07-10 16:45:53,176 WARN  ipc.Server (Server.java:logException(2724)) - 
> IPC Server handler 14 on 35066, call Call#127037 Retry#0 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol.send from 17
> 2.31.116.73:52540
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.lambda$sortDatanodes$0(ScmBlockLocationProtocolServerSideTranslatorPB.java:215)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.sortDatanodes(ScmBlockLocationProtocolServerSideTranslatorPB.java:215)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:124)
> at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> 2019-07-10 16:45:53,176 WARN  om.KeyManagerImpl 
> (KeyManagerImpl.java:lambda$sortDatanodeInPipeline$7(2129)) - Unable to sort 
> datanodes based on distance to client, volume=xqoyzocpse, bucket=vxwajaczqh, 
> key=pool-444-thread-7-201077822, client=127.0.0.1, 
> datanodes=[10f15723-45d7-4a0c-8f01-8b101744a110{ip: 172.31.116.73, host: 
> sid-minichaos.gce.cloudera.com, networkLocation: /default-rack, certSerialId: 
> null}, 7ac2777f-0a5c-4414-9e7f-bfbc47d696ea{ip: 172.31.116.73, host: 
> sid-minichaos.gce.cloudera.com, networkLocation: /default-rack, certSerialId: 
> null}], exception=java.lang.NullPointerException
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.lambda$sortDatanodes$0(ScmBlockLocationProtocolServerSideTranslatorPB.java:215)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.sortDatanodes(ScmBlockLocationProtocolServerSideTranslatorPB.java:215)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:124)
> at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> 

[jira] [Commented] (HDDS-1787) NPE thrown while trying to find DN closest to client

2019-07-14 Thread Sammi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884825#comment-16884825
 ] 

Sammi Chen commented on HDDS-1787:
--

Hi [~swagle], I would like to know how to run the MiniOzoneChaos cluster to 
verify the issue is fixed. 

> NPE thrown while trying to find DN closest to client
> 
>
> Key: HDDS-1787
> URL: https://issues.apache.org/jira/browse/HDDS-1787
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Siddharth Wagle
>Assignee: Sammi Chen
>Priority: Major
>
> cc: [~xyao] This seems related to the client side topology changes, not sure 
> if some other Jira is already addressing this.
> {code}
> 2019-07-10 16:45:53,176 WARN  ipc.Server (Server.java:logException(2724)) - 
> IPC Server handler 14 on 35066, call Call#127037 Retry#0 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol.send from 17
> 2.31.116.73:52540
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.lambda$sortDatanodes$0(ScmBlockLocationProtocolServerSideTranslatorPB.java:215)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.sortDatanodes(ScmBlockLocationProtocolServerSideTranslatorPB.java:215)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:124)
> at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> 2019-07-10 16:45:53,176 WARN  om.KeyManagerImpl 
> (KeyManagerImpl.java:lambda$sortDatanodeInPipeline$7(2129)) - Unable to sort 
> datanodes based on distance to client, volume=xqoyzocpse, bucket=vxwajaczqh, 
> key=pool-444-thread-7-201077822, client=127.0.0.1, 
> datanodes=[10f15723-45d7-4a0c-8f01-8b101744a110{ip: 172.31.116.73, host: 
> sid-minichaos.gce.cloudera.com, networkLocation: /default-rack, certSerialId: 
> null}, 7ac2777f-0a5c-4414-9e7f-bfbc47d696ea{ip: 172.31.116.73, host: 
> sid-minichaos.gce.cloudera.com, networkLocation: /default-rack, certSerialId: 
> null}], exception=java.lang.NullPointerException
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.lambda$sortDatanodes$0(ScmBlockLocationProtocolServerSideTranslatorPB.java:215)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.sortDatanodes(ScmBlockLocationProtocolServerSideTranslatorPB.java:215)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:124)
> at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To 

[jira] [Updated] (HDDS-1553) Add metrics in rack aware container placement policy

2019-07-11 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1553:
-
Description: 
To collect following statistics, 
1. total requested datanode count (A)
2. success allocated datanode count without constrain compromise (B)
3. success allocated datanode count with some comstrain compromise (C)

B includes C, failed allocation = (A - B)

  was:
To collect following statistics, 
1. total requested datanode count (A)
2. success allocated datanode count without constrain compromise (B)
3. success allocated datanode count with some comstrain compromise (C)
4. failed to allocated datanode count (D)

A = B + C + D,   B includes C


> Add metrics in rack aware container placement policy
> 
>
> Key: HDDS-1553
> URL: https://issues.apache.org/jira/browse/HDDS-1553
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>
> To collect following statistics, 
> 1. total requested datanode count (A)
> 2. success allocated datanode count without constrain compromise (B)
> 3. success allocated datanode count with some comstrain compromise (C)
> B includes C, failed allocation = (A - B)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1553) Add metrics in rack aware container placement policy

2019-07-11 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1553:
-
Description: 
To collect following statistics, 
1. total requested datanode count (A)
2. success allocated datanode count without constrain compromise (B)
3. success allocated datanode count with some comstrain compromise (C)
4. failed to allocated datanode count (D)

A = B + C + D,   B includes C

  was:
To collect following statistics, 
1. total requested datanode count (A)
2. success allocated datanode count without constrain compromise (B)
3. success allocated datanode count with some comstrain compromise (C)
4. failed to allocated datanode count (D)

A = B + C + D


> Add metrics in rack aware container placement policy
> 
>
> Key: HDDS-1553
> URL: https://issues.apache.org/jira/browse/HDDS-1553
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>
> To collect following statistics, 
> 1. total requested datanode count (A)
> 2. success allocated datanode count without constrain compromise (B)
> 3. success allocated datanode count with some comstrain compromise (C)
> 4. failed to allocated datanode count (D)
> A = B + C + D,   B includes C



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1663) Add datanode to network topology cluster during node register

2019-06-09 Thread Sammi Chen (JIRA)
Sammi Chen created HDDS-1663:


 Summary: Add datanode to network topology cluster during node 
register
 Key: HDDS-1663
 URL: https://issues.apache.org/jira/browse/HDDS-1663
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Sammi Chen
Assignee: Sammi Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1662) Missing test resources of integrataion-test project in target directory after compile

2019-06-09 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-1662:
-
Issue Type: Sub-task  (was: Bug)
Parent: HDDS-698

> Missing test resources of integrataion-test project in target directory after 
> compile
> -
>
> Key: HDDS-1662
> URL: https://issues.apache.org/jira/browse/HDDS-1662
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>
> The integration-test project, its origin resources missed in target directory 
> after compile. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1662) Missing test resources of integrataion-test project in target directory after compile

2019-06-08 Thread Sammi Chen (JIRA)
Sammi Chen created HDDS-1662:


 Summary: Missing test resources of integrataion-test project in 
target directory after compile
 Key: HDDS-1662
 URL: https://issues.apache.org/jira/browse/HDDS-1662
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Sammi Chen
Assignee: Sammi Chen


The integration-test project, its origin resources missed in target directory 
after compile. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1661) Consolidate hadoop-hdds and hadoop-ozone into hadoop-ozone-project

2019-06-08 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen reassigned HDDS-1661:


Assignee: (was: Sammi Chen)

> Consolidate hadoop-hdds and hadoop-ozone into hadoop-ozone-project
> --
>
> Key: HDDS-1661
> URL: https://issues.apache.org/jira/browse/HDDS-1661
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Priority: Major
>
> Ozone source code is some what fragmented in Hadoop source code.  The current 
> code looks like:
> {code}
> hadoop/pom.ozone.xml
> ├── hadoop-hdds
> └── hadoop-ozone
> {code}
> It is helpful to consolidate the project into high level grouping such as:
> {code}
> hadoop
> └── hadoop-ozone-project/pom.xml
> └── hadoop-ozone-project/hadoop-hdds
> └── hadoop-ozone-project/hadoop-ozone
> {code}
> This allows user to build ozone from hadoop-ozone-project directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1661) Consolidate hadoop-hdds and hadoop-ozone into hadoop-ozone-project

2019-06-08 Thread Sammi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen reassigned HDDS-1661:


Assignee: Sammi Chen  (was: Bharat Viswanadham)

> Consolidate hadoop-hdds and hadoop-ozone into hadoop-ozone-project
> --
>
> Key: HDDS-1661
> URL: https://issues.apache.org/jira/browse/HDDS-1661
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Sammi Chen
>Priority: Major
>
> Ozone source code is some what fragmented in Hadoop source code.  The current 
> code looks like:
> {code}
> hadoop/pom.ozone.xml
> ├── hadoop-hdds
> └── hadoop-ozone
> {code}
> It is helpful to consolidate the project into high level grouping such as:
> {code}
> hadoop
> └── hadoop-ozone-project/pom.xml
> └── hadoop-ozone-project/hadoop-hdds
> └── hadoop-ozone-project/hadoop-ozone
> {code}
> This allows user to build ozone from hadoop-ozone-project directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1653) Add option to "ozone scmcli printTopology" to order the output acccording to topology layer

2019-06-05 Thread Sammi Chen (JIRA)
Sammi Chen created HDDS-1653:


 Summary: Add option to "ozone scmcli printTopology" to order the 
output acccording to topology layer
 Key: HDDS-1653
 URL: https://issues.apache.org/jira/browse/HDDS-1653
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Sammi Chen


Add option to order the output acccording to topology layer.
For example, for /rack/node topolgy, we can show,
State = HEALTHY
/default-rack:
ozone_datanode_1.ozone_default/172.18.0.3
ozone_datanode_2.ozone_default/172.18.0.2
ozone_datanode_3.ozone_default/172.18.0.4
/rack1:
ozone_datanode_4.ozone_default/172.18.0.5
ozone_datanode_5.ozone_default/172.18.0.6
For /dc/rack/node topology, we can either show
State = HEALTHY
/default-dc/default-rack:
ozone_datanode_1.ozone_default/172.18.0.3
ozone_datanode_2.ozone_default/172.18.0.2
ozone_datanode_3.ozone_default/172.18.0.4
/dc1/rack1:
ozone_datanode_4.ozone_default/172.18.0.5
ozone_datanode_5.ozone_default/172.18.0.6

or

State = HEALTHY
default-dc:
default-rack:
ozone_datanode_1.ozone_default/172.18.0.3
ozone_datanode_2.ozone_default/172.18.0.2
ozone_datanode_3.ozone_default/172.18.0.4
dc1:
rack1:
ozone_datanode_4.ozone_default/172.18.0.5
ozone_datanode_5.ozone_default/172.18.0.6



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   >