[jira] [Updated] (HDDS-1572) Implement a Pipeline scrubber to maintain healthy number of pipelines in a cluster
[ https://issues.apache.org/jira/browse/HDDS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-1572: --- Description: The design document talks about initial requirements for the pipeline scrubber. - Scan the pipelines that the nodes are a part of to select candidates for teardown. - Scan pipelines that do not have open containers currently in use and datanodes are in violation. - Schedule tear down operation if a candidate pipeline is found. was: The design document talks about initial requirements for the pipeline scrubber. - Maintain a datastructure for datanodes violating the pipeline membership soft upper bound. - Scan the pipelines that the nodes are a part of to select candidates for teardown. - Scan pipelines that do not have open containers currently in use and datanodes are in violation. - Schedule tear down operation if a candidate pipeline is found. > Implement a Pipeline scrubber to maintain healthy number of pipelines in a > cluster > -- > > Key: HDDS-1572 > URL: https://issues.apache.org/jira/browse/HDDS-1572 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Siddharth Wagle >Assignee: Li Cheng >Priority: Major > > The design document talks about initial requirements for the pipeline > scrubber. > - Scan the pipelines that the nodes are a part of to select candidates for > teardown. > - Scan pipelines that do not have open containers currently in use and > datanodes are in violation. > - Schedule tear down operation if a candidate pipeline is found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1572) Implement a Pipeline scrubber to maintain healthy number of pipelines in a cluster
[ https://issues.apache.org/jira/browse/HDDS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-1572: -- Assignee: Li Cheng > Implement a Pipeline scrubber to maintain healthy number of pipelines in a > cluster > -- > > Key: HDDS-1572 > URL: https://issues.apache.org/jira/browse/HDDS-1572 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Siddharth Wagle >Assignee: Li Cheng >Priority: Major > > The design document talks about initial requirements for the pipeline > scrubber. > - Maintain a datastructure for datanodes violating the pipeline membership > soft upper bound. > - Scan the pipelines that the nodes are a part of to select candidates for > teardown. > - Scan pipelines that do not have open containers currently in use and > datanodes are in violation. > - Schedule tear down operation if a candidate pipeline is found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1573) Add scrubber metrics and pipeline metrics
[ https://issues.apache.org/jira/browse/HDDS-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-1573: -- Assignee: Li Cheng > Add scrubber metrics and pipeline metrics > - > > Key: HDDS-1573 > URL: https://issues.apache.org/jira/browse/HDDS-1573 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Siddharth Wagle >Assignee: Li Cheng >Priority: Major > > - Add metrics for how many pipelines per datanode > - Add metrics for pipelines that are chosen by the scrubber > - Add metrics for pipelines that are in violation -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2396) OM rocksdb core dump during writing
[ https://issues.apache.org/jira/browse/HDDS-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-2396. Assignee: Li Cheng Resolution: Fixed > OM rocksdb core dump during writing > --- > > Key: HDDS-2396 > URL: https://issues.apache.org/jira/browse/HDDS-2396 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Attachments: hs_err_pid9340.log > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > > There happens core dump in rocksdb while it's occasional. > > Stack: [0x7f5891a23000,0x7f5891b24000], sp=0x7f5891b21bb8, free > space=1018k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 > C [librocksdbjni3192271038586903156.so+0x358fec] > rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&, rocksdb: > :ValueType)+0x51c > C [librocksdbjni3192271038586903156.so+0x359d17] > rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&)+0x17 > C [librocksdbjni3192271038586903156.so+0x3513bc] > rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c > C [librocksdbjni3192271038586903156.so+0x354df9] > rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, > unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, > bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 > C [librocksdbjni3192271038586903156.so+0x29fd79] > rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, > bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 > C [librocksdbjni3192271038586903156.so+0x2a0431] > rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*)+0x21 > C [librocksdbjni3192271038586903156.so+0x1a064c] > Java_org_rocksdb_RocksDB_write0+0xcc > J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe > [0x7f58f1872d00+0xbe] > J 10093% C1 > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V > (400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc] > j > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4 > j java.lang.Thread.run()V+11 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2396) OM rocksdb core dump during writing
[ https://issues.apache.org/jira/browse/HDDS-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976315#comment-16976315 ] Li Cheng commented on HDDS-2396: This is resolved in [https://github.com/apache/hadoop-ozone/pull/100]/. Thanks for [~bharat]'s fix. > OM rocksdb core dump during writing > --- > > Key: HDDS-2396 > URL: https://issues.apache.org/jira/browse/HDDS-2396 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 >Reporter: Li Cheng >Priority: Major > Attachments: hs_err_pid9340.log > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > > There happens core dump in rocksdb while it's occasional. > > Stack: [0x7f5891a23000,0x7f5891b24000], sp=0x7f5891b21bb8, free > space=1018k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 > C [librocksdbjni3192271038586903156.so+0x358fec] > rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&, rocksdb: > :ValueType)+0x51c > C [librocksdbjni3192271038586903156.so+0x359d17] > rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&)+0x17 > C [librocksdbjni3192271038586903156.so+0x3513bc] > rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c > C [librocksdbjni3192271038586903156.so+0x354df9] > rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, > unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, > bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 > C [librocksdbjni3192271038586903156.so+0x29fd79] > rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, > bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 > C [librocksdbjni3192271038586903156.so+0x2a0431] > rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*)+0x21 > C [librocksdbjni3192271038586903156.so+0x1a064c] > Java_org_rocksdb_RocksDB_write0+0xcc > J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe > [0x7f58f1872d00+0xbe] > J 10093% C1 > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V > (400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc] > j > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4 > j java.lang.Thread.run()V+11 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976258#comment-16976258 ] Li Cheng commented on HDDS-2356: I checked out [https://github.com/apache/hadoop-ozone/pull/163] and compiled out a jar to deploy onto my cluster. [~bharat] > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2018-11-15-OM-logs.txt, 2019-11-06_18_13_57_422_ERROR, > hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png, > om-audit-VM_50_210_centos.log, om_audit_log_plc_1570863541668_9278.txt > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at >
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976255#comment-16976255 ] Li Cheng commented on HDDS-2356: [~bharat] Make sense. I failed to find Key:plc_1570869510243_5542 related info in Om audit logs. It might have been rotated. I can try again today to have some fresh logs. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2018-11-15-OM-logs.txt, 2019-11-06_18_13_57_422_ERROR, > hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png, > om-audit-VM_50_210_centos.log, om_audit_log_plc_1570863541668_9278.txt > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976237#comment-16976237 ] Li Cheng commented on HDDS-2356: Tried [https://github.com/apache/hadoop-ozone/pull/163] to run. It did last longer, which is good thing. But eventually it still failed due to a NO_SUCH_MULTIPART_UPLOAD_ERROR error. Attached the logs in the attachment. Interesting is that the first error happened earlier but didn't prevented writing. After a few hours, it failed and bellow is the last error log. 2019-11-15 22:13:56,493 ERROR org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: MultipartUpload Commit is failed for Key:plc_1570869510243_5542 in Volume/Bucket s325d55ad283aa400af464c76d713c07ad/ozone-test NO_SUCH_MULTIPART_UPLOAD_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload is with specified uploadId 69162f8b-a923-4247-bb67-b1d6f9fa0d9d-103141824303150377 at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:159) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:217) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) at org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2018-11-15-OM-logs.txt, 2019-11-06_18_13_57_422_ERROR, > hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png, > om-audit-VM_50_210_centos.log, om_audit_log_plc_1570863541668_9278.txt > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at >
[jira] [Updated] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2356: --- Attachment: 2018-11-15-OM-logs.txt > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2018-11-15-OM-logs.txt, 2019-11-06_18_13_57_422_ERROR, > hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png, > om-audit-VM_50_210_centos.log, om_audit_log_plc_1570863541668_9278.txt > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown
[jira] [Resolved] (HDDS-2492) Fix test clean up issue in TestSCMPipelineManager
[ https://issues.apache.org/jira/browse/HDDS-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-2492. Fix Version/s: 0.4.1 Resolution: Fixed > Fix test clean up issue in TestSCMPipelineManager > - > > Key: HDDS-2492 > URL: https://issues.apache.org/jira/browse/HDDS-2492 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Sammi Chen >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > Fix For: 0.4.1 > > Time Spent: 10m > Remaining Estimate: 0h > > This was opened based on [~sammichen]'s investigation on HDDS-2034. > > {quote}Failure is caused by newly introduced function > TestSCMPipelineManager#testPipelineOpenOnlyWhenLeaderReported which doesn't > close pipelineManager at the end. It's better to fix it in a new JIRA. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDDS-2492) Fix test clean up issue in TestSCMPipelineManager
[ https://issues.apache.org/jira/browse/HDDS-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-2492 started by Li Cheng. -- > Fix test clean up issue in TestSCMPipelineManager > - > > Key: HDDS-2492 > URL: https://issues.apache.org/jira/browse/HDDS-2492 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Sammi Chen >Assignee: Li Cheng >Priority: Major > > This was opened based on [~sammichen]'s investigation on HDDS-2034. > > {quote}Failure is caused by newly introduced function > TestSCMPipelineManager#testPipelineOpenOnlyWhenLeaderReported which doesn't > close pipelineManager at the end. It's better to fix it in a new JIRA. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2492) Fix test clean up issue in TestSCMPipelineManager
[ https://issues.apache.org/jira/browse/HDDS-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974837#comment-16974837 ] Li Cheng commented on HDDS-2492: [https://github.com/apache/hadoop-ozone/pull/179] > Fix test clean up issue in TestSCMPipelineManager > - > > Key: HDDS-2492 > URL: https://issues.apache.org/jira/browse/HDDS-2492 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Sammi Chen >Assignee: Li Cheng >Priority: Major > > This was opened based on [~sammichen]'s investigation on HDDS-2034. > > {quote}Failure is caused by newly introduced function > TestSCMPipelineManager#testPipelineOpenOnlyWhenLeaderReported which doesn't > close pipelineManager at the end. It's better to fix it in a new JIRA. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2356: --- Attachment: om-audit-VM_50_210_centos.log > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, > image-2019-10-31-18-56-56-177.png, om-audit-VM_50_210_centos.log, > om_audit_log_plc_1570863541668_9278.txt > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at >
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973103#comment-16973103 ] Li Cheng commented on HDDS-2356: [~bharat] I run the test again. And now see the error in audit log: ]} | ret=FAILURE | NO_SUCH_MULTIPART_UPLOAD_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload is with specified uploadId bd6579f4-22ce-4c15-8402-e979f1251b13-103129369238110745 at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:156) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:217) ...skipping... uuid: "79bf7bdf-ed29-49d4-bf7c-e88fdbd2ce03" ipAddress: "9.134.51.215" hostName: "9.134.51.215" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "79bf7bdf-ed29-49d4-bf7c-e88fdbd2ce03" networkLocation: "/default-rack" } state: PIPELINE_OPEN type: RATIS factor: THREE id { id: "ec6b06c5-193f-4c30-879b-5a12284dc4f8" } } ]} | ret=FAILURE | NO_SUCH_MULTIPART_UPLOAD_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload is . th specified uploadId bd6579f4-22ce-4c15-8402-e979f1251b13-103129369238110745 at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:156) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:217) I'm uploading the entire om audit logs in the attachment. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, > image-2019-10-31-18-56-56-177.png, om_audit_log_plc_1570863541668_9278.txt > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971290#comment-16971290 ] Li Cheng edited comment on HDDS-2356 at 11/11/19 7:21 AM: -- [~bharat] In terms of the key in the last stacktrace: 2019-11-08 20:08:24,832 ERROR org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest: MultipartUpload Complete request failed for Key: plc_1570863541668_9278 in Volume/Bucket s325d55ad283aa400af464c76d713c07ad/ozone-test INVALID_PART org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: plc_1570863541668_9278 OM audit logs show a bunch of entries for key plc_1570863541668_9278 with different clientID, for instance: 2019-11-08 20:19:56,241 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ALLOCATE_BLOCK \{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570863541668_9278, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[], clientID=103102209394803336} | ret=SUCCESS | I'm uploading the entire OM audit logs about this key plc_1570863541668_9278. This file size: [root@VM_50_210_centos /data/idex_data/zip]# ls -altr -h ./20191012/plc_1570863541668_9278 -rw-r--r-- 1 1003 users 1.4G 10月 22 10:33 ./20191012/plc_1570863541668_9278 You can try creating a similar size of file on you own and see it reproduces the issue. Please refer to the description for env details. was (Author: timmylicheng): [~bharat] In terms of the key in the last stacktrace: 2019-11-08 20:08:24,832 ERROR org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest: MultipartUpload Complete request failed for Key: plc_1570863541668_9278 in Volume/Bucket s325d55ad283aa400af464c76d713c07ad/ozone-test INVALID_PART org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: plc_1570863541668_9278 OM audit logs show a bunch of entries for key plc_1570863541668_9278 with different clientID, for instance: 2019-11-08 20:19:56,241 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ALLOCATE_BLOCK \{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570863541668_9278, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[], clientID=103102209394803336} | ret=SUCCESS | I'm uploading the entire OM audit logs about this key plc_1570863541668_9278. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, > image-2019-10-31-18-56-56-177.png, om_audit_log_plc_1570863541668_9278.txt > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at >
[jira] [Updated] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2356: --- Attachment: om_audit_log_plc_1570863541668_9278.txt > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, > image-2019-10-31-18-56-56-177.png, om_audit_log_plc_1570863541668_9278.txt > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at >
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971290#comment-16971290 ] Li Cheng commented on HDDS-2356: [~bharat] In terms of the key in the last stacktrace: 2019-11-08 20:08:24,832 ERROR org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest: MultipartUpload Complete request failed for Key: plc_1570863541668_9278 in Volume/Bucket s325d55ad283aa400af464c76d713c07ad/ozone-test INVALID_PART org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: plc_1570863541668_9278 OM audit logs show a bunch of entries for key plc_1570863541668_9278 with different clientID, for instance: 2019-11-08 20:19:56,241 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ALLOCATE_BLOCK \{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570863541668_9278, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[], clientID=103102209394803336} | ret=SUCCESS | I'm uploading the entire OM audit logs about this key plc_1570863541668_9278. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, > image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: >
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971289#comment-16971289 ] Li Cheng edited comment on HDDS-2356 at 11/11/19 3:14 AM: -- [~bharat] As I said, debug logs for Goofys doesn't work since goofys debug mode is doing everything in single thread. The same error is not reproduced? Are you modeling some sample dataset and env to test the issues? I can try reproducing from my side and upload OM log, s3g log as well audit log here. Does that work? Also, do you see 2019-11-06_18_13_57_422_ERROR in the attachment? Does it help? was (Author: timmylicheng): [~bharat] As I said, debug logs for Goofys doesn't work since goofys debug mode is doing everything in single thread. The same error is not reproduced? Are you modeling some sample dataset and env to test the issues? I can try reproducing from my side and upload OM log, s3g log as well audit log here. Does that work? > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, > image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at >
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971289#comment-16971289 ] Li Cheng commented on HDDS-2356: [~bharat] As I said, debug logs for Goofys doesn't work since goofys debug mode is doing everything in single thread. The same error is not reproduced? Are you modeling some sample dataset and env to test the issues? I can try reproducing from my side and upload OM log, s3g log as well audit log here. Does that work? > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, > image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at >
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970119#comment-16970119 ] Li Cheng commented on HDDS-2356: [~bharat] New error shows up using today's master branch. 2019-11-08 20:08:24,832 ERROR org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest: MultipartUpload Complete request failed for Key: plc_1570863541668_9278 in Volume/Bucket s325d55ad283aa400af464c76d713c07ad/ozone-test INVALID_PART org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: plc_1570863541668_9278 at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.validateAndUpdateCache(S3MultipartUploadCompleteRequest.java:187) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:217) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) at org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, > image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at >
[jira] [Updated] (HDDS-2443) Python client/interface for Ozone
[ https://issues.apache.org/jira/browse/HDDS-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2443: --- Attachment: (was: OzoneS3.py) > Python client/interface for Ozone > - > > Key: HDDS-2443 > URL: https://issues.apache.org/jira/browse/HDDS-2443 > Project: Hadoop Distributed Data Store > Issue Type: New Feature > Components: Ozone Client >Reporter: Li Cheng >Priority: Major > Attachments: OzoneS3.py > > > Original ideas: item#25 in > [https://cwiki.apache.org/confluence/display/HADOOP/Ozone+project+ideas+for+new+contributors] > Ozone Client(Python) for Data Science Notebook such as Jupyter. > # Size: Large > # PyArrow: [https://pypi.org/project/pyarrow/] > # Python -> libhdfs HDFS JNI library (HDFS, S3,...) -> Java client API > Impala uses libhdfs > > Path to try: > # s3 interface: Ozone s3 gateway(already supported) + AWS python client > (boto3) > # python native RPC > # pyarrow + libhdfs, which use the Java client under the hood. > # python + C interface of go / rust ozone library. I created POC go / rust > clients earlier which can be improved if the libhdfs interface is not good > enough. [By [~elek]] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2443) Python client/interface for Ozone
[ https://issues.apache.org/jira/browse/HDDS-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2443: --- Attachment: OzoneS3.py > Python client/interface for Ozone > - > > Key: HDDS-2443 > URL: https://issues.apache.org/jira/browse/HDDS-2443 > Project: Hadoop Distributed Data Store > Issue Type: New Feature > Components: Ozone Client >Reporter: Li Cheng >Priority: Major > Attachments: OzoneS3.py > > > Original ideas: item#25 in > [https://cwiki.apache.org/confluence/display/HADOOP/Ozone+project+ideas+for+new+contributors] > Ozone Client(Python) for Data Science Notebook such as Jupyter. > # Size: Large > # PyArrow: [https://pypi.org/project/pyarrow/] > # Python -> libhdfs HDFS JNI library (HDFS, S3,...) -> Java client API > Impala uses libhdfs > > Path to try: > # s3 interface: Ozone s3 gateway(already supported) + AWS python client > (boto3) > # python native RPC > # pyarrow + libhdfs, which use the Java client under the hood. > # python + C interface of go / rust ozone library. I created POC go / rust > clients earlier which can be improved if the libhdfs interface is not good > enough. [By [~elek]] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2443) Python client/interface for Ozone
[ https://issues.apache.org/jira/browse/HDDS-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2443: --- Attachment: OzoneS3.py > Python client/interface for Ozone > - > > Key: HDDS-2443 > URL: https://issues.apache.org/jira/browse/HDDS-2443 > Project: Hadoop Distributed Data Store > Issue Type: New Feature > Components: Ozone Client >Reporter: Li Cheng >Priority: Major > Attachments: OzoneS3.py > > > Original ideas: item#25 in > [https://cwiki.apache.org/confluence/display/HADOOP/Ozone+project+ideas+for+new+contributors] > Ozone Client(Python) for Data Science Notebook such as Jupyter. > # Size: Large > # PyArrow: [https://pypi.org/project/pyarrow/] > # Python -> libhdfs HDFS JNI library (HDFS, S3,...) -> Java client API > Impala uses libhdfs > > Path to try: > # s3 interface: Ozone s3 gateway(already supported) + AWS python client > (boto3) > # python native RPC > # pyarrow + libhdfs, which use the Java client under the hood. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2443) Python client/interface for Ozone
[ https://issues.apache.org/jira/browse/HDDS-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969974#comment-16969974 ] Li Cheng commented on HDDS-2443: [~elek] That would be great! I've uploaded my naive version of interface. :) > Python client/interface for Ozone > - > > Key: HDDS-2443 > URL: https://issues.apache.org/jira/browse/HDDS-2443 > Project: Hadoop Distributed Data Store > Issue Type: New Feature > Components: Ozone Client >Reporter: Li Cheng >Priority: Major > Attachments: OzoneS3.py > > > Original ideas: item#25 in > [https://cwiki.apache.org/confluence/display/HADOOP/Ozone+project+ideas+for+new+contributors] > Ozone Client(Python) for Data Science Notebook such as Jupyter. > # Size: Large > # PyArrow: [https://pypi.org/project/pyarrow/] > # Python -> libhdfs HDFS JNI library (HDFS, S3,...) -> Java client API > Impala uses libhdfs > > Path to try: > # s3 interface: Ozone s3 gateway(already supported) + AWS python client > (boto3) > # python native RPC > # pyarrow + libhdfs, which use the Java client under the hood. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2356: --- Attachment: 2019-11-06_18_13_57_422_ERROR > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, > image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at >
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969888#comment-16969888 ] Li Cheng commented on HDDS-2356: [~bharat] Please check the attachment one more time. I re-upload the logs. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, > image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at >
[jira] [Commented] (HDDS-2443) Python client/interface for Ozone
[ https://issues.apache.org/jira/browse/HDDS-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969771#comment-16969771 ] Li Cheng commented on HDDS-2443: Prototyping with S3 gateway + boto3 now. Reads, Writes and Deletes can be done. Large object read may need some tweak. Only concern is when it's only uploading files to S3, it shows read timeout towards ozone endpoint: ReadTimeoutError: Read timeout on endpoint URL: "http://localhost:9878/ozone-test/./20191011/plc_1570784946653_2774; > Python client/interface for Ozone > - > > Key: HDDS-2443 > URL: https://issues.apache.org/jira/browse/HDDS-2443 > Project: Hadoop Distributed Data Store > Issue Type: New Feature > Components: Ozone Client >Reporter: Li Cheng >Priority: Major > > Original ideas: > Ozone Client(Python) for Data Science Notebook such as Jupyter. > # Size: Large > # PyArrow: [https://pypi.org/project/pyarrow/] > # Python -> libhdfs HDFS JNI library (HDFS, S3,...) -> Java client API > Impala uses libhdfs > # How Jupyter iPython work: > [https://jupyter.readthedocs.io/en/latest/architecture/how_jupyter_ipython_work.html] > # Eco, > Architecture:[https://ipython-books.github.io/chapter-3-mastering-the-jupyter-notebook/] > > Path to try: > 1. s3 interface: Ozone s3 gateway(already supported) + AWS python client > (boto3) > 2. python native RPC > 3. pyarrow + libhdfs, which use the Java client under the hood. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2443) Python client/interface for Ozone
Li Cheng created HDDS-2443: -- Summary: Python client/interface for Ozone Key: HDDS-2443 URL: https://issues.apache.org/jira/browse/HDDS-2443 Project: Hadoop Distributed Data Store Issue Type: New Feature Components: Ozone Client Reporter: Li Cheng Original ideas: Ozone Client(Python) for Data Science Notebook such as Jupyter. # Size: Large # PyArrow: [https://pypi.org/project/pyarrow/] # Python -> libhdfs HDFS JNI library (HDFS, S3,...) -> Java client API Impala uses libhdfs # How Jupyter iPython work: [https://jupyter.readthedocs.io/en/latest/architecture/how_jupyter_ipython_work.html] # Eco, Architecture:[https://ipython-books.github.io/chapter-3-mastering-the-jupyter-notebook/] Path to try: 1. s3 interface: Ozone s3 gateway(already supported) + AWS python client (boto3) 2. python native RPC 3. pyarrow + libhdfs, which use the Java client under the hood. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968978#comment-16968978 ] Li Cheng commented on HDDS-2356: [~bharat] My program won't abort, but I see some errors in s3g logs with timeline matching. Not sure if it's related. 十一月 06, 2019 6:11:35 下午 org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference cleanQueue 严重: *~*~*~ Channel ManagedChannelImpl\{logId=32225, target=9.134.51.232:9859} was not shutdown properly!!! ~*~*~* Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() returns true. java.lang.RuntimeException: ManagedChannel allocation site at org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103) at org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53) at org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44) at org.apache.ratis.thirdparty.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:411) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.connectToDatanode(XceiverClientGrpc.java:192) at org.apache.hadoop.hdds.scm.XceiverClientGrpc.connect(XceiverClientGrpc.java:139) at org.apache.hadoop.hdds.scm.XceiverClientManager$2.call(XceiverClientManager.java:242) at org.apache.hadoop.hdds.scm.XceiverClientManager$2.call(XceiverClientManager.java:226) at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) at com.google.common.cache.LocalCache.get(LocalCache.java:3965) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764) at org.apache.hadoop.hdds.scm.XceiverClientManager.getClient(XceiverClientManager.java:226) at org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:172) at org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClientForReadData(XceiverClientManager.java:162) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:154) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:118) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:224) at org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:173) at org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47) at java.io.InputStream.read(InputStream.java:101) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2146) at org.apache.commons.io.IOUtils.copy(IOUtils.java:2102) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2123) at org.apache.commons.io.IOUtils.copy(IOUtils.java:2078) at org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.lambda$get$0(ObjectEndpoint.java:252) at org.glassfish.jersey.message.internal.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:79) at org.glassfish.jersey.message.internal.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:61) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.invokeWriteTo(WriterInterceptorExecutor.java:266) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.aroundWriteTo(WriterInterceptorExecutor.java:251) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:163) at org.glassfish.jersey.server.internal.JsonWithPaddingInterceptor.aroundWriteTo(JsonWithPaddingInterceptor.java:109) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:163) at org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:85) at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:163) at org.glassfish.jersey.message.internal.MessageBodyFactory.writeTo(MessageBodyFactory.java:1135) at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:662) at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:395) at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:385) at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:280) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:272) at
[jira] [Updated] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2356: --- Description: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say it's VM0. I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path on VM0, while reading data from VM0 local disk and write to mount path. The dataset has various sizes of files from 0 byte to GB-level and it has a number of ~50,000 files. The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors related with Multipart upload. This error eventually causes the writing to terminate and OM to be closed. Updated on 11/06/2019: See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs are in the attachment. 2019-11-05 18:12:37,766 ERROR org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: MultipartUpload Commit is failed for Key:./2 0191012/plc_1570863541668_9278 in Volume/Bucket s325d55ad283aa400af464c76d713c07ad/ozone-test NO_SUCH_MULTIPART_UPLOAD_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload is with specified uploadId fcda8608-b431-48b7-8386- 0a332f1a709a-103084683261641950 at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 56) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. java:217) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) at org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) Updated on 10/28/2019: See MISMATCH_MULTIPART_LIST error. 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete Multipart Upload Request for bucket: ozone-test, key: 20191012/plc_1570863541668_927 8 MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s3c89e813c80ffcea9543004d57b2a1239bucket: ozone-testkey: 20191012/plc_1570863541668_9278 at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB .java:1104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) at org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) at org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) at org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) at
[jira] [Updated] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2356: --- Description: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say it's VM0. I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path on VM0, while reading data from VM0 local disk and write to mount path. The dataset has various sizes of files from 0 byte to GB-level and it has a number of ~50,000 files. The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors related with Multipart upload. This error eventually causes the writing to terminate and OM to be closed. Updated on 11/06/2019: See new multipart upload error and full logs are in the attachment. 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete Multipart Upload Request for bucket: ozone-test, key: 20191012/plc_1570863541668_927 8 MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s3c89e813c80ffcea9543004d57b2a1239bucket: ozone-testkey: 20191012/plc_1570863541668_9278 at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB .java:1104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) at org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) at org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) at org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) The following errors has been resolved in https://issues.apache.org/jira/browse/HDDS-2322. 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with exit status 2: OMDoubleBuffer flush threadOMDoubleBufferFlushThreadencountered Throwable error java.util.ConcurrentModificationException at java.util.TreeMap.forEach(TreeMap.java:1004) at org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) at org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) at org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) at org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) at org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) at org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) at java.util.Iterator.forEachRemaining(Iterator.java:116) at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) at java.lang.Thread.run(Thread.java:745) 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: was: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say it's VM0. I use goofys as a fuse and enable ozone S3
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968957#comment-16968957 ] Li Cheng commented on HDDS-2356: See new Multipart upload error in yesterday's master. [~bharat] Check the newly attached log 2019-11-06 18:13:57,422 ERROR for more info. It seemed to fail to multiple keys. 2019-11-05 18:12:37,766 ERROR org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: MultipartUpload Commit is failed for Key:./2 0191012/plc_1570863541668_9278 in Volume/Bucket s325d55ad283aa400af464c76d713c07ad/ozone-test NO_SUCH_MULTIPART_UPLOAD_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload is with specified uploadId fcda8608-b431-48b7-8386- 0a332f1a709a-103084683261641950 at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 56) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. java:217) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) at org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) > at >
[jira] [Commented] (HDDS-2396) OM rocksdb core dump during writing
[ https://issues.apache.org/jira/browse/HDDS-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966372#comment-16966372 ] Li Cheng commented on HDDS-2396: [~bharat] Probably not up to date if HDDS 2379 is resolved in recent a week. I can try to use the most updated master again. > OM rocksdb core dump during writing > --- > > Key: HDDS-2396 > URL: https://issues.apache.org/jira/browse/HDDS-2396 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 >Reporter: Li Cheng >Priority: Major > Attachments: hs_err_pid9340.log > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > > There happens core dump in rocksdb while it's occasional. > > Stack: [0x7f5891a23000,0x7f5891b24000], sp=0x7f5891b21bb8, free > space=1018k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 > C [librocksdbjni3192271038586903156.so+0x358fec] > rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&, rocksdb: > :ValueType)+0x51c > C [librocksdbjni3192271038586903156.so+0x359d17] > rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&)+0x17 > C [librocksdbjni3192271038586903156.so+0x3513bc] > rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c > C [librocksdbjni3192271038586903156.so+0x354df9] > rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, > unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, > bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 > C [librocksdbjni3192271038586903156.so+0x29fd79] > rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, > bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 > C [librocksdbjni3192271038586903156.so+0x2a0431] > rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*)+0x21 > C [librocksdbjni3192271038586903156.so+0x1a064c] > Java_org_rocksdb_RocksDB_write0+0xcc > J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe > [0x7f58f1872d00+0xbe] > J 10093% C1 > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V > (400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc] > j > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4 > j java.lang.Thread.run()V+11 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964554#comment-16964554 ] Li Cheng edited comment on HDDS-2356 at 11/1/19 3:45 AM: - Also see a core dump in rocksdb during last night's testing. Please check the attachment for the entire log. >From the first glance, it looks like when rocksdb is iterating the write_batch >to insert to the memtable, there happens a stl memory error during memory >movement. It might not be related to ozone, but it would cause rocksdb >failure. Created https://issues.apache.org/jira/browse/HDDS-2396 to track the core dump in OM rocksdb. Below is some part of the stack: C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 C [librocksdbjni3192271038586903156.so+0x358fec] rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb: :ValueType)+0x51c C [librocksdbjni3192271038586903156.so+0x359d17] rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&)+0x17 C [librocksdbjni3192271038586903156.so+0x3513bc] rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c C [librocksdbjni3192271038586903156.so+0x354df9] rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 C [librocksdbjni3192271038586903156.so+0x29fd79] rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 C [librocksdbjni3192271038586903156.so+0x2a0431] rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21 C [librocksdbjni3192271038586903156.so+0x1a064c] Java_org_rocksdb_RocksDB_write0+0xcc J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe [0x7f58f1872d00+0xbe] J 10093% C1 org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V (400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc] j org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4 was (Author: timmylicheng): Also see a core dump in rocksdb during last night's testing. Please check the attachment for the entire log. >From the first glance, it looks like when rocksdb is iterating the write_batch >to insert to the memtable, there happens a stl memory error during memory >movement. It might not be related to ozone, but it would cause rocksdb >failure. Below is some part of the stack: C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 C [librocksdbjni3192271038586903156.so+0x358fec] rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb: :ValueType)+0x51c C [librocksdbjni3192271038586903156.so+0x359d17] rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&)+0x17 C [librocksdbjni3192271038586903156.so+0x3513bc] rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c C [librocksdbjni3192271038586903156.so+0x354df9] rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 C [librocksdbjni3192271038586903156.so+0x29fd79] rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 C [librocksdbjni3192271038586903156.so+0x2a0431] rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21 C [librocksdbjni3192271038586903156.so+0x1a064c] Java_org_rocksdb_RocksDB_write0+0xcc J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe [0x7f58f1872d00+0xbe] J 10093% C1 org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V (400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc] j org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4 > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments:
[jira] [Commented] (HDDS-2396) OM rocksdb core dump during writing
[ https://issues.apache.org/jira/browse/HDDS-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964578#comment-16964578 ] Li Cheng commented on HDDS-2396: Attached the entire log for the core dump. Will try to turn on ulimit and reproduce this, but it happens only occasionally. > OM rocksdb core dump during writing > --- > > Key: HDDS-2396 > URL: https://issues.apache.org/jira/browse/HDDS-2396 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 >Reporter: Li Cheng >Priority: Major > Attachments: hs_err_pid9340.log > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > > There happens core dump in rocksdb while it's occasional. > > Stack: [0x7f5891a23000,0x7f5891b24000], sp=0x7f5891b21bb8, free > space=1018k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 > C [librocksdbjni3192271038586903156.so+0x358fec] > rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&, rocksdb: > :ValueType)+0x51c > C [librocksdbjni3192271038586903156.so+0x359d17] > rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&)+0x17 > C [librocksdbjni3192271038586903156.so+0x3513bc] > rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c > C [librocksdbjni3192271038586903156.so+0x354df9] > rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, > unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, > bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 > C [librocksdbjni3192271038586903156.so+0x29fd79] > rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, > bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 > C [librocksdbjni3192271038586903156.so+0x2a0431] > rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*)+0x21 > C [librocksdbjni3192271038586903156.so+0x1a064c] > Java_org_rocksdb_RocksDB_write0+0xcc > J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe > [0x7f58f1872d00+0xbe] > J 10093% C1 > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V > (400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc] > j > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4 > j java.lang.Thread.run()V+11 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2396) OM rocksdb core dump during writing
[ https://issues.apache.org/jira/browse/HDDS-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2396: --- Attachment: hs_err_pid9340.log > OM rocksdb core dump during writing > --- > > Key: HDDS-2396 > URL: https://issues.apache.org/jira/browse/HDDS-2396 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 >Reporter: Li Cheng >Priority: Major > Attachments: hs_err_pid9340.log > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > > There happens core dump in rocksdb while it's occasional. > > Stack: [0x7f5891a23000,0x7f5891b24000], sp=0x7f5891b21bb8, free > space=1018k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 > C [librocksdbjni3192271038586903156.so+0x358fec] > rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&, rocksdb: > :ValueType)+0x51c > C [librocksdbjni3192271038586903156.so+0x359d17] > rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&)+0x17 > C [librocksdbjni3192271038586903156.so+0x3513bc] > rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c > C [librocksdbjni3192271038586903156.so+0x354df9] > rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, > unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, > bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 > C [librocksdbjni3192271038586903156.so+0x29fd79] > rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, > bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 > C [librocksdbjni3192271038586903156.so+0x2a0431] > rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*)+0x21 > C [librocksdbjni3192271038586903156.so+0x1a064c] > Java_org_rocksdb_RocksDB_write0+0xcc > J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe > [0x7f58f1872d00+0xbe] > J 10093% C1 > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V > (400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc] > j > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4 > j java.lang.Thread.run()V+11 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2396) OM rocksdb core dump during writing
Li Cheng created HDDS-2396: -- Summary: OM rocksdb core dump during writing Key: HDDS-2396 URL: https://issues.apache.org/jira/browse/HDDS-2396 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Manager Affects Versions: 0.4.1 Reporter: Li Cheng Attachments: hs_err_pid9340.log Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say it's VM0. I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path on VM0, while reading data from VM0 local disk and write to mount path. The dataset has various sizes of files from 0 byte to GB-level and it has a number of ~50,000 files. There happens core dump in rocksdb while it's occasional. Stack: [0x7f5891a23000,0x7f5891b24000], sp=0x7f5891b21bb8, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 C [librocksdbjni3192271038586903156.so+0x358fec] rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb: :ValueType)+0x51c C [librocksdbjni3192271038586903156.so+0x359d17] rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&)+0x17 C [librocksdbjni3192271038586903156.so+0x3513bc] rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c C [librocksdbjni3192271038586903156.so+0x354df9] rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 C [librocksdbjni3192271038586903156.so+0x29fd79] rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 C [librocksdbjni3192271038586903156.so+0x2a0431] rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21 C [librocksdbjni3192271038586903156.so+0x1a064c] Java_org_rocksdb_RocksDB_write0+0xcc J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe [0x7f58f1872d00+0xbe] J 10093% C1 org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V (400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc] j org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4 j java.lang.Thread.run()V+11 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2395) Handle Ozone S3 completeMPU to match with aws s3 behavior.
[ https://issues.apache.org/jira/browse/HDDS-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964573#comment-16964573 ] Li Cheng commented on HDDS-2395: Also please note the exclude list issue. 2019-11-01 11:25:24,047 [qtp1383524016-27648] INFO - Allocating block with ExcludeList \{datanodes = [], containerIds = [], pipelineIds = [PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97]} > Handle Ozone S3 completeMPU to match with aws s3 behavior. > -- > > Key: HDDS-2395 > URL: https://issues.apache.org/jira/browse/HDDS-2395 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > # When uploaded 2 parts, and when complete upload 1 part no error > # During complete multipart upload name/part number not matching with > uploaded part and part number then InvalidPart error > # When parts are not specified in sorted order InvalidPartOrder > # During complete multipart upload when no
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964554#comment-16964554 ] Li Cheng commented on HDDS-2356: Also see a core dump in rocksdb during last night's testing. Please check the attachment for the entire log. >From the first glance, it looks like when rocksdb is iterating the write_batch >to insert to the memtable, there happens a stl memory error during memory >movement. It might not be related to ozone, but it would cause rocksdb >failure. Below is some part of the stack: C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 C [librocksdbjni3192271038586903156.so+0x358fec] rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb: :ValueType)+0x51c C [librocksdbjni3192271038586903156.so+0x359d17] rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&)+0x17 C [librocksdbjni3192271038586903156.so+0x3513bc] rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c C [librocksdbjni3192271038586903156.so+0x354df9] rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 C [librocksdbjni3192271038586903156.so+0x29fd79] rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 C [librocksdbjni3192271038586903156.so+0x2a0431] rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21 C [librocksdbjni3192271038586903156.so+0x1a064c] Java_org_rocksdb_RocksDB_write0+0xcc J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe [0x7f58f1872d00+0xbe] J 10093% C1 org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V (400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc] j org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4 > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) > at > org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) > at > org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) >
[jira] [Updated] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2356: --- Attachment: hs_err_pid9340.log > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) > at > org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) > at > org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) > at > org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) > at > org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) > > The following errors has been resolved in > https://issues.apache.org/jira/browse/HDDS-2322. > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at >
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964138#comment-16964138 ] Li Cheng commented on HDDS-2356: [~msingh] I check OM audit log and it looks like this one: dataSize=215547 2019-10-31 17:44:05,645 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ALLOCATE_KEY {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=./20191012/plc_1570860558836_9937, dataSize=215547, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[blockID { 2019-10-31 17:44:10,725 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ALLOCATE_KEY {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=./20191012/plc_1570860558836_9937, dataSize=215547, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[blockID { 2019-10-31 17:44:15,915 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ALLOCATE_KEY {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=./20191012/plc_1570860558836_9937, dataSize=215547, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[blockID { 2019-10-31 17:44:21,108 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ALLOCATE_KEY {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=./20191012/plc_1570860558836_9937, dataSize=215547, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[blockID { 2019-10-31 17:44:28,319 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ALLOCATE_KEY {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=./20191012/plc_1570860558836_9937, dataSize=215547, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[blockID { 2019-10-31 17:44:32,951 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ALLOCATE_KEY {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=./20191012/plc_1570860558836_9937, dataSize=215547, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[blockID { 2019-10-31 17:44:38,360 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ALLOCATE_KEY {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=./20191012/plc_1570860558836_9937, dataSize=215547, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[blockID { 2019-10-31 17:44:43,390 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ALLOCATE_KEY {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=./20191012/plc_1570860558836_9937, dataSize=215547, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[blockID { 2019-10-31 17:44:48,498 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ALLOCATE_KEY {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=./20191012/plc_1570860558836_9937, dataSize=215547, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[blockID { 2019-10-31 17:44:55,654 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ALLOCATE_KEY {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=./20191012/plc_1570860558836_9937, dataSize=215547, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[blockID { > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at >
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963875#comment-16963875 ] Li Cheng commented on HDDS-2356: I tried multiprocessing in python to fetch data locally and write to remote ozone cluster thru s3 gateway and boto3 (aws python client). Shows an RPC error: 2019-10-31 17:44:33,123 [qtp1383524016-25] ERROR (OzoneManagerProtocolClientSideTranslatorPB.java:268) - Failed to connect to OM. Attempted 10 retries a nd 10 failovers 2019-10-31 17:44:33,124 [qtp1383524016-25] ERROR (ObjectEndpoint.java:186) - Exception occurred in PutObject java.io.IOException: Failed on local exception: org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length; Host Details : local host is: "VM_50_210_centos/127.0.0.1"; destination host is: "9.134.50.210":9862; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:816) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) at org.apache.hadoop.ipc.Client.call(Client.java:1457) at org.apache.hadoop.ipc.Client.call(Client.java:1367) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy81.submitRequest(Unknown Source) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy81.submitRequest(Unknown Source) at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.submitRequest(OzoneManagerProtocolClientSideTranslatorPB.java:338) at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.openKey(OzoneManagerProtocolClientSideTranslatorPB.java:723) at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) at com.sun.proxy.$Proxy82.openKey(Unknown Source) at org.apache.hadoop.ozone.client.rpc.RpcClient.createKey(RpcClient.java:614) at org.apache.hadoop.ozone.client.OzoneBucket.createKey(OzoneBucket.java:325) at org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.put(ObjectEndpoint.java:173) at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:415) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:104) at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:277) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:272) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:268) at org.glassfish.jersey.internal.Errors.process(Errors.java:316) at org.glassfish.jersey.internal.Errors.process(Errors.java:298) at org.glassfish.jersey.internal.Errors.process(Errors.java:268) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:289) at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:256) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:703) at
[jira] [Updated] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2356: --- Attachment: image-2019-10-31-18-56-56-177.png > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) > at > org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) > at > org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) > at > org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) > at > org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) > > The following errors has been resolved in > https://issues.apache.org/jira/browse/HDDS-2322. > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at >
[jira] [Updated] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2356: --- Description: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say it's VM0. I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path on VM0, while reading data from VM0 local disk and write to mount path. The dataset has various sizes of files from 0 byte to GB-level and it has a number of ~50,000 files. The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors related with Multipart upload. This error eventually causes the writing to terminate and OM to be closed. 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete Multipart Upload Request for bucket: ozone-test, key: 20191012/plc_1570863541668_927 8 MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s3c89e813c80ffcea9543004d57b2a1239bucket: ozone-testkey: 20191012/plc_1570863541668_9278 at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB .java:1104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) at org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) at org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) at org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) The following errors has been resolved in https://issues.apache.org/jira/browse/HDDS-2322. 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with exit status 2: OMDoubleBuffer flush threadOMDoubleBufferFlushThreadencountered Throwable error java.util.ConcurrentModificationException at java.util.TreeMap.forEach(TreeMap.java:1004) at org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) at org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) at org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) at org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) at org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) at org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) at java.util.Iterator.forEachRemaining(Iterator.java:116) at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) at java.lang.Thread.run(Thread.java:745) 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: was: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say it's VM0. I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path on VM0, while reading data from VM0 local disk and write to mount
[jira] [Updated] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2356: --- Description: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say it's VM0. I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path on VM0, while reading data from VM0 local disk and write to mount path. The dataset has various sizes of files from 0 byte to GB-level and it has a number of ~50,000 files. The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors related with Multipart upload. This error eventually causes the writing to terminate and OM to be closed. 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete Multipart Upload Request for bucket: ozone-test, key: 20191012/plc_1570863541668_927 8 MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s3c89e813c80ffcea9543004d57b2a1239bucket: ozone-testkey: 20191012/plc_1570863541668_9278 at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB .java:1104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) at org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) at org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) at org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) The following errors has been resolved in 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with exit status 2: OMDoubleBuffer flush threadOMDoubleBufferFlushThreadencountered Throwable error java.util.ConcurrentModificationException at java.util.TreeMap.forEach(TreeMap.java:1004) at org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) at org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) at org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) at org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) at org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) at org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) at java.util.Iterator.forEachRemaining(Iterator.java:116) at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) at java.lang.Thread.run(Thread.java:745) 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: was: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say it's VM0. I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path on VM0, while reading data from VM0 local disk and write to mount path. The dataset has various sizes of files from 0 byte to GB-level and it
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963684#comment-16963684 ] Li Cheng commented on HDDS-2356: [~bharat] I tried debug_s3 and debug_fuse in goofys, but I believe it would turn into a single-thread mode in goofys and we won't see the reproduction. What are you looking for in audit logs? > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2186: --- Comment: was deleted (was: https://github.com/apache/hadoop-ozone/pull/28) > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Affects Versions: 0.5.0 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: flaky-test > Fix For: HDDS-1564 > > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatisRpc$3(RatisPipelineProvider.java:171) > at >
[jira] [Commented] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962740#comment-16962740 ] Li Cheng commented on HDDS-2186: The OOM issue resolved in https://github.com/apache/hadoop-ozone/pull/28. > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Affects Versions: 0.5.0 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: flaky-test > Fix For: 0.5.0 > > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) > at >
[jira] [Resolved] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-2186. Fix Version/s: (was: 0.5.0) HDDS-1564 Target Version/s: HDDS-1564 Resolution: Fixed https://github.com/apache/hadoop-ozone/pull/28 > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Affects Versions: 0.5.0 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: flaky-test > Fix For: HDDS-1564 > > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) > at >
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961604#comment-16961604 ] Li Cheng edited comment on HDDS-2356 at 10/29/19 6:53 AM: -- [~bharat] I'm using python to write to a OS path. sth like: {code:java} // code placeholder sub_files = os.listdir(dir) num = 0 output = "" for sub_file in sub_files: sub_file_path = dir + sub_file with open(dest_dir + sub_file_path, "w") as fw: with open(sub_file_path, "r") as fr: line = fr.readline() while line: num += 1 output += line if (num >= 2000): fw.write(output) output = "" num = 0 line = fr.readline() fw.write(output) {code} Also I'm using S3 gateway to connect to ozone and mount local file path by fuse (goofys). Have you tested s3 gateway? Most unit tests are going thru RPC. was (Author: timmylicheng): [~bharat] I'm using python to write to a OS path. sth like: {code:java} // code placeholder sub_files = os.listdir(dir) num = 0 output = "" for sub_file in sub_files: sub_file_path = dir + sub_file with open(dest_dir + sub_file_path, "w") as fw: with open(sub_file_path, "r") as fr: line = fr.readline() while line: num += 1 output += line if (num >= 2000): fw.write(output) output = "" num = 0 line = fr.readline() fw.write(output) {code} > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-2356: -- Assignee: Bharat Viswanadham > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961607#comment-16961607 ] Li Cheng commented on HDDS-2356: [~bharat] The long printing logs happen only in your branch tho. The full log is too huge to put here. You can think of the same logs as attached over and over again for multi megabyte size. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961604#comment-16961604 ] Li Cheng edited comment on HDDS-2356 at 10/29/19 2:29 AM: -- [~bharat] I'm using python to write to a OS path. sth like: {code:java} // code placeholder sub_files = os.listdir(dir) num = 0 output = "" for sub_file in sub_files: sub_file_path = dir + sub_file with open(dest_dir + sub_file_path, "w") as fw: with open(sub_file_path, "r") as fr: line = fr.readline() while line: num += 1 output += line if (num >= 2000): fw.write(output) output = "" num = 0 line = fr.readline() fw.write(output) {code} was (Author: timmylicheng): [~bharat] I'm using python to write to a OS path. sth like: {code:java} // code placeholder for sub_file in sub_files: sub_file_path = dir + sub_file with open(dest_dir + sub_file_path, "w") as fw: with open(sub_file_path, "r") as fr: line = fr.readline() while line: num += 1 output += line if (num >= 2000): fw.write(output) output = "" num = 0 line = fr.readline() fw.write(output) {code} > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961604#comment-16961604 ] Li Cheng edited comment on HDDS-2356 at 10/29/19 2:25 AM: -- [~bharat] I'm using python to write to a OS path. sth like: {code:java} // code placeholder for sub_file in sub_files: sub_file_path = dir + sub_file with open(dest_dir + sub_file_path, "w") as fw: with open(sub_file_path, "r") as fr: line = fr.readline() while line: num += 1 output += line if (num >= 2000): fw.write(output) output = "" num = 0 line = fr.readline() fw.write(output) {code} was (Author: timmylicheng): [~bharat] I'm using python to write to a OS path. sth like: {code:java} // code placeholder {code} for sub_file in sub_files: sub_file_path = dir + sub_file with open(dest_dir + sub_file_path, "w") as fw: with open(sub_file_path, "r") as fr: line = fr.readline() while line: num += 1 output += line if (num >= 2000): fw.write(output) output = "" num = 0 line = fr.readline() fw.write(output) > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961604#comment-16961604 ] Li Cheng edited comment on HDDS-2356 at 10/29/19 2:24 AM: -- [~bharat] I'm using python to write to a OS path. sth like: {code:java} // code placeholder {code} for sub_file in sub_files: sub_file_path = dir + sub_file with open(dest_dir + sub_file_path, "w") as fw: with open(sub_file_path, "r") as fr: line = fr.readline() while line: num += 1 output += line if (num >= 2000): fw.write(output) output = "" num = 0 line = fr.readline() fw.write(output) was (Author: timmylicheng): [~bharat] I'm using python to write to a OS path. sth like: for sub_file in sub_files: sub_file_path = dir + sub_file with open(dest_dir + sub_file_path, "w") as fw: with open(sub_file_path, "r") as fr: line = fr.readline() while line: num += 1 output += line if (num >= 2000): fw.write(output) output = "" num = 0 line = fr.readline() fw.write(output) > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961604#comment-16961604 ] Li Cheng commented on HDDS-2356: [~bharat] I'm using python to write to a OS path. sth like: for sub_file in sub_files: sub_file_path = dir + sub_file with open(dest_dir + sub_file_path, "w") as fw: with open(sub_file_path, "r") as fr: line = fr.readline() while line: num += 1 output += line if (num >= 2000): fw.write(output) output = "" num = 0 line = fr.readline() fw.write(output) > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reopened HDDS-2356: Assignee: (was: Bharat Viswanadham) > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960870#comment-16960870 ] Li Cheng commented on HDDS-2356: I take this Jira to track issues that seem to be related with Multipart upload in my testing. Reopen this. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960866#comment-16960866 ] Li Cheng commented on HDDS-2356: MISMATCH_MULTIPART_LIST seems to be a recurring error. Never be able to finish this. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2322) DoubleBuffer flush termination and OM shutdown's after that.
[ https://issues.apache.org/jira/browse/HDDS-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960802#comment-16960802 ] Li Cheng commented on HDDS-2322: https://issues.apache.org/jira/browse/HDDS-2356 is still having issues. Do you mean to track here? [~bharat] > DoubleBuffer flush termination and OM shutdown's after that. > > > Key: HDDS-2322 > URL: https://issues.apache.org/jira/browse/HDDS-2322 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > om1_1 | 2019-10-18 00:34:45,317 [OMDoubleBufferFlushThread] ERROR > - Terminating with exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > om1_1 | java.util.ConcurrentModificationException > om1_1 | at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1660) > om1_1 | at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) > om1_1 | at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) > om1_1 | at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) > om1_1 | at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > om1_1 | at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) > om1_1 | at > org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfoGroup.getProtobuf(OmKeyLocationInfoGroup.java:65) > om1_1 | at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) > om1_1 | at > java.base/java.util.Collections$2.tryAdvance(Collections.java:4745) > om1_1 | at > java.base/java.util.Collections$2.forEachRemaining(Collections.java:4753) > om1_1 | at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) > om1_1 | at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) > om1_1 | at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) > om1_1 | at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > om1_1 | at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) > om1_1 | at > org.apache.hadoop.ozone.om.helpers.OmKeyInfo.getProtobuf(OmKeyInfo.java:362) > om1_1 | at > org.apache.hadoop.ozone.om.codec.OmKeyInfoCodec.toPersistedFormat(OmKeyInfoCodec.java:37) > om1_1 | at > org.apache.hadoop.ozone.om.codec.OmKeyInfoCodec.toPersistedFormat(OmKeyInfoCodec.java:31) > om1_1 | at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > om1_1 | at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > om1_1 | at > org.apache.hadoop.ozone.om.response.key.OMKeyCreateResponse.addToDBBatch(OMKeyCreateResponse.java:58) > om1_1 | at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:139) > om1_1 | at > java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) > om1_1 | at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:137) > om1_1 | at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960799#comment-16960799 ] Li Cheng commented on HDDS-2356: [~bharat] In term of reproduction, I have a dataset which includes small files as well as big files and I'm using s3 gateway from ozone and mount ozone cluster to a local path by goofys. All the data are recursively written to the mount path, which essentially leads to ozone cluster. The ozone cluster is deployed on a 3-node VMs env and each VM has only 1 disk for ozone data writing. I think it's a pretty simple scenario to reproduce. The solely operation is writing to ozone cluster thru fuse. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960796#comment-16960796 ] Li Cheng commented on HDDS-2356: Also it's print the same pipeline id in s3g logs like crazy. Wonder if that's expected. [~bharat] 2019-10-28 11:43:08,912 [qtp1383524016-24] INFO - Allocating block with ExcludeList \{datanodes = [], containerIds = [], pipelineIds = []} ...skipping... eID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960744#comment-16960744 ] Li Cheng commented on HDDS-2356: [~bharat] shows up another error. See the stack: 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete Multipart Upload Request for bucket: ozone-test, key: 20191012/plc_1570863541668_927 8 MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s3c89e813c80ffcea9543004d57b2a1239bucket: ozone-testkey: 20191012/plc_1570863541668_9278 at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB .java:1104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) at org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) at org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) at org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:415) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:104) at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:277) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:272) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:268) at org.glassfish.jersey.internal.Errors.process(Errors.java:316) at org.glassfish.jersey.internal.Errors.process(Errors.java:298) at org.glassfish.jersey.internal.Errors.process(Errors.java:268) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:289) at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:256) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:703) at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:416) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:370) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:229) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1780) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1609) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959468#comment-16959468 ] Li Cheng commented on HDDS-2356: [~bharat]Yea, you are right. It does happen randomly. Seeing it again. When will HDDS-2322 be merged into master? > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959407#comment-16959407 ] Li Cheng commented on HDDS-2356: Quick update. I tried to make ozone have more handlers (like 10+ times more) and cease to see this error. See the attached properties. However, writing fails due to no more blocks allocated. I guess my cluster cannot keep up with the writing. ozone.scm.handler.count.key 128 OZONE, MANAGEMENT, PERFORMANCE The number of RPC handler threads for each SCM service endpoint. The default is appropriate for small clusters (tens of nodes). Set a value that is appropriate for the cluster size. Generally, HDFS recommends RPC handler count is set to 20 * log2(Cluster Size) with an upper limit of 200. However, SCM will not have the same amount of traffic as Namenode, so a value much smaller than that will work well too. ozone.om.handler.count.key 256 OM, PERFORMANCE The number of RPC handler threads for OM service endpoints. dfs.container.ratis.num.container.op.executors 128 OZONE, RATIS, PERFORMANCE Number of executors that will be used by Ratis to execute container ops.(10 by default). dfs.container.ratis.num.write.chunk.threads 512 OZONE, RATIS, PERFORMANCE Maximum number of threads in the thread pool that Ratis will use for writing chunks (60 by default). > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
Li Cheng created HDDS-2356: -- Summary: Multipart upload report errors while writing to ozone Ratis pipeline Key: HDDS-2356 URL: https://issues.apache.org/jira/browse/HDDS-2356 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Manager Affects Versions: 0.4.1 Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM Reporter: Li Cheng Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say it's VM0. I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path on VM0, while reading data from VM0 local disk and write to mount path. The dataset has various sizes of files from 0 byte to GB-level and it has a number of ~50,000 files. The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors related with Multipart upload. This error eventually causes the writing to terminate and OM to be closed. 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with exit status 2: OMDoubleBuffer flush threadOMDoubleBufferFlushThreadencountered Throwable error java.util.ConcurrentModificationException at java.util.TreeMap.forEach(TreeMap.java:1004) at org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) at org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) at org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) at org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) at org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) at org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) at java.util.Iterator.forEachRemaining(Iterator.java:116) at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) at java.lang.Thread.run(Thread.java:745) 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1576) Ensure constraint of one raft log per disk is met unless fast media
[ https://issues.apache.org/jira/browse/HDDS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-1576: -- Assignee: Li Cheng > Ensure constraint of one raft log per disk is met unless fast media > --- > > Key: HDDS-1576 > URL: https://issues.apache.org/jira/browse/HDDS-1576 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM >Reporter: Siddharth Wagle >Assignee: Li Cheng >Priority: Major > > SCM should not try to create a raft group by placing the raft log on a disk > that is already used by existing Ratis ring for an open pipeline. > This constraint would have to be applied by either throwing an exception > during pipeline creation or by looking at configs on the SCM side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1574) Ensure same datanodes are not a part of multiple pipelines
[ https://issues.apache.org/jira/browse/HDDS-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955811#comment-16955811 ] Li Cheng commented on HDDS-1574: [~swagle] Cloud you elaborate more to help understand this task? Do you mean we don't want 2 pipelines to share the exactly same group of datanodes as members? > Ensure same datanodes are not a part of multiple pipelines > -- > > Key: HDDS-1574 > URL: https://issues.apache.org/jira/browse/HDDS-1574 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Siddharth Wagle >Assignee: Li Cheng >Priority: Major > > Details in design doc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1570) Refactor heartbeat reports to report all the pipelines that are open
[ https://issues.apache.org/jira/browse/HDDS-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953672#comment-16953672 ] Li Cheng commented on HDDS-1570: As I look into this JIRA, I would like discuss some bout this JIRA: 1. the SCMHeartbeatRequestProto has included a list of PipelineReport ([https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/proto/StorageContainerDatanodeProtocol.proto#L128]) 2. HeartbeatEndpointTask is adding all pipeline reports into every heartbeat request ([https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/states/endpoint/HeartbeatEndpointTask.java#L182]) 3. The XceiverServerRatis seems to handle all raft groups together and every group has one pipelineId. Hence, every receiver server has all the pipeline reports. ([https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/XceiverServerRatis.java#L599]) So it seems that currently heartbeat has considers a bulk of pipelineIds from both sender and receiver sides. Do we have to make any changes to heartbeat reports then? And also, what's the benefit for us to only report open pipelines in reports? [~swagle] [~sammichen] [~xyao] > Refactor heartbeat reports to report all the pipelines that are open > > > Key: HDDS-1570 > URL: https://issues.apache.org/jira/browse/HDDS-1570 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode >Reporter: Siddharth Wagle >Assignee: Li Cheng >Priority: Major > > Presently the pipeline report only reports a single pipeline id. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1569) Add ability to SCM for creating multiple pipelines with same datanode
[ https://issues.apache.org/jira/browse/HDDS-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953570#comment-16953570 ] Li Cheng commented on HDDS-1569: PR [https://github.com/apache/hadoop-ozone/pull/28] is under review. > Add ability to SCM for creating multiple pipelines with same datanode > - > > Key: HDDS-1569 > URL: https://issues.apache.org/jira/browse/HDDS-1569 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Siddharth Wagle >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > Time Spent: 8h 10m > Remaining Estimate: 0h > > - Refactor _RatisPipelineProvider.create()_ to be able to create pipelines > with datanodes that are not a part of sufficient pipelines > - Define soft and hard upper bounds for pipeline membership > - Create SCMAllocationManager that can be leveraged to get a candidate set of > datanodes based on placement policies > - Add the datanodes to internal datastructures -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2186: --- Comment: was deleted (was: [https://github.com/apache/hadoop/pull/1431]) > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Affects Versions: 0.5.0 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: flaky-test > Fix For: 0.5.0 > > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatisRpc$3(RatisPipelineProvider.java:171) > at >
[jira] [Issue Comment Deleted] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2186: --- Comment: was deleted (was: [https://github.com/apache/hadoop/pull/1431]) > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Affects Versions: 0.5.0 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: flaky-test > Fix For: 0.5.0 > > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatisRpc$3(RatisPipelineProvider.java:171) > at >
[jira] [Commented] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953568#comment-16953568 ] Li Cheng commented on HDDS-2186: Try to resolve this in [https://github.com/apache/hadoop-ozone/pull/28] > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Affects Versions: 0.5.0 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: flaky-test > Fix For: 0.5.0 > > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) > at >
[jira] [Work started] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-2186 started by Li Cheng. -- > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Affects Versions: HDDS-1564 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: flaky-test > Fix For: HDDS-1564 > > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatisRpc$3(RatisPipelineProvider.java:171) > at >
[jira] [Commented] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949933#comment-16949933 ] Li Cheng commented on HDDS-2186: [https://github.com/apache/hadoop/pull/1431] > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Affects Versions: HDDS-1564 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: flaky-test > Fix For: HDDS-1564 > > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatisRpc$3(RatisPipelineProvider.java:171)
[jira] [Commented] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949932#comment-16949932 ] Li Cheng commented on HDDS-2186: [https://github.com/apache/hadoop/pull/1431] > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Affects Versions: HDDS-1564 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: flaky-test > Fix For: HDDS-1564 > > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatisRpc$3(RatisPipelineProvider.java:171)
[jira] [Updated] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2186: --- Fix Version/s: HDDS-1564 > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Affects Versions: HDDS-1564 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: flaky-test > Fix For: HDDS-1564 > > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatisRpc$3(RatisPipelineProvider.java:171) > at >
[jira] [Updated] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2186: --- Component/s: test > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Affects Versions: HDDS-1564 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: flaky-test > Fix For: HDDS-1564 > > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatisRpc$3(RatisPipelineProvider.java:171) > at >
[jira] [Commented] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949930#comment-16949930 ] Li Cheng commented on HDDS-2186: It turns out the miniOzoneCluster running out of memory is triggered by endless creation of pipeline. Add logic to restrict endless pipeline creation in [https://github.com/apache/hadoop/pull/1431]. > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Affects Versions: HDDS-1564 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: flaky-test > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) > at >
[jira] [Assigned] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-2186: -- Assignee: Li Cheng > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Affects Versions: HDDS-1564 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: flaky-test > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatisRpc$3(RatisPipelineProvider.java:171) > at > java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386) > at
[jira] [Commented] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939296#comment-16939296 ] Li Cheng commented on HDDS-2186: [~ljain] After some investigation, it turned out MiniOzoneCluster is abusing resources to create pipelines. Reason it didn't have problems before is that every datanode could only be assigned to one pipeline so that the quota runs out fast. Now the limit is taken off and there is no virtual limit to prevent cluster from creating pipelines other than ratis says resource like memory is not enough. I'm adding logic to prevent this, but unfortunately, factor ONE and factor THREE pipelines need to be handled differently, the logic grows more and more complex. > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Affects Versions: HDDS-1564 >Reporter: Li Cheng >Priority: Major > Labels: flaky-test > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at >
[jira] [Commented] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938670#comment-16938670 ] Li Cheng commented on HDDS-2186: Note that CI is also affected and it cannot print out the correct output due to memory issues. > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Affects Versions: HDDS-1564 >Reporter: Li Cheng >Priority: Major > Labels: flaky-test > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatisRpc$3(RatisPipelineProvider.java:171) > at >
[jira] [Updated] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2186: --- Labels: flaky-test (was: ) > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Affects Versions: HDDS-1564 >Reporter: Li Cheng >Priority: Major > Labels: flaky-test > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatisRpc$3(RatisPipelineProvider.java:171) > at > java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386) > at
[jira] [Updated] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
[ https://issues.apache.org/jira/browse/HDDS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2186: --- Affects Version/s: HDDS-1564 > Fix tests using MiniOzoneCluster for its memory related exceptions > -- > > Key: HDDS-2186 > URL: https://issues.apache.org/jira/browse/HDDS-2186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Affects Versions: HDDS-1564 >Reporter: Li Cheng >Priority: Major > > After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a > bunch of 'out of memory' exceptions in ratis. Attached sample stacks. > > 2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exception2019-09-26 15:12:22,824 > [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] > ERROR segmented.SegmentedRaftLogWorker > (SegmentedRaftLogWorker.java:run(323)) - > 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker > hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:694) at > java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at > java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > > which leads to: > 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR > pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 > [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 > for > c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: > deadline exceeded after 2999881264ns at > org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at > org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) > at > org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) > at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatisRpc$3(RatisPipelineProvider.java:171) > at > java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386) > at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at >
[jira] [Created] (HDDS-2186) Fix tests using MiniOzoneCluster for its memory related exceptions
Li Cheng created HDDS-2186: -- Summary: Fix tests using MiniOzoneCluster for its memory related exceptions Key: HDDS-2186 URL: https://issues.apache.org/jira/browse/HDDS-2186 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng After multi-raft usage, MiniOzoneCluster seems to be fishy and reports a bunch of 'out of memory' exceptions in ratis. Attached sample stacks. 2019-09-26 15:12:22,824 [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] ERROR segmented.SegmentedRaftLogWorker (SegmentedRaftLogWorker.java:run(323)) - 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker hit exception2019-09-26 15:12:22,824 [2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker] ERROR segmented.SegmentedRaftLogWorker (SegmentedRaftLogWorker.java:run(323)) - 2e1e11ca-833a-4fbc-b948-3d93fc8e7288@group-218F3868CEA9-SegmentedRaftLogWorker hit exceptionjava.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:694) at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) at java.lang.Thread.run(Thread.java:748) which leads to: 2019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 for c1f4d375-683b-42fe-983b-428a63aa88032019-09-26 15:12:23,029 [RATISCREATEPIPELINE1] ERROR pipeline.RatisPipelineProvider (RatisPipelineProvider.java:lambda$null$2(181)) - Failed invoke Ratis rpc org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider$$Lambda$297/1222454951@55d1e990 for c1f4d375-683b-42fe-983b-428a63aa8803org.apache.ratis.protocol.TimeoutIOException: deadline exceeded after 2999881264ns at org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:82) at org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at org.apache.ratis.grpc.client.GrpcClientProtocolClient.blockingCall(GrpcClientProtocolClient.java:178) at org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupAdd(GrpcClientProtocolClient.java:147) at org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:94) at org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:278) at org.apache.ratis.client.impl.RaftClientImpl.groupAdd(RaftClientImpl.java:205) at org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$initializePipeline$1(RatisPipelineProvider.java:142) at org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$null$2(RatisPipelineProvider.java:177) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) at org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.lambda$callRatisRpc$3(RatisPipelineProvider.java:171) at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 2999881264ns at
[jira] [Assigned] (HDDS-1574) Ensure same datanodes are not a part of multiple pipelines
[ https://issues.apache.org/jira/browse/HDDS-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-1574: -- Assignee: Li Cheng (was: Siddharth Wagle) > Ensure same datanodes are not a part of multiple pipelines > -- > > Key: HDDS-1574 > URL: https://issues.apache.org/jira/browse/HDDS-1574 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Siddharth Wagle >Assignee: Li Cheng >Priority: Major > > Details in design doc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1570) Refactor heartbeat reports to report all the pipelines that are open
[ https://issues.apache.org/jira/browse/HDDS-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-1570: -- Assignee: Li Cheng (was: Siddharth Wagle) > Refactor heartbeat reports to report all the pipelines that are open > > > Key: HDDS-1570 > URL: https://issues.apache.org/jira/browse/HDDS-1570 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode >Reporter: Siddharth Wagle >Assignee: Li Cheng >Priority: Major > > Presently the pipeline report only reports a single pipeline id. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1933) Datanode should use hostname in place of ip addresses to allow DN's to work when ipaddress change
[ https://issues.apache.org/jira/browse/HDDS-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933136#comment-16933136 ] Li Cheng commented on HDDS-1933: After discussion, we suppose the change could remain as it is now. And [~msingh] Please try to set DFS_DATANODE_USE_DN_HOSTNAME_DEFAULT = true in kubernetes env and see it works. > Datanode should use hostname in place of ip addresses to allow DN's to work > when ipaddress change > - > > Key: HDDS-1933 > URL: https://issues.apache.org/jira/browse/HDDS-1933 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Priority: Blocker > > This was noticed by [~elek] while deploying Ozone on Kubernetes based > environment. > When the datanode ip address change on restart, the Datanode details cease to > be correct for the datanode. and this prevents the cluster from functioning > after a restart. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1933) Datanode should use hostname in place of ip addresses to allow DN's to work when ipaddress change
[ https://issues.apache.org/jira/browse/HDDS-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932172#comment-16932172 ] Li Cheng commented on HDDS-1933: [~swagle] I was talking with [~Sammi] about this. How do we decide whether to use IP or hostname? Or rather under which env to use one or the other? Currently users can config it in configuration xml and shall we keep it this way? > Datanode should use hostname in place of ip addresses to allow DN's to work > when ipaddress change > - > > Key: HDDS-1933 > URL: https://issues.apache.org/jira/browse/HDDS-1933 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Priority: Blocker > > This was noticed by [~elek] while deploying Ozone on Kubernetes based > environment. > When the datanode ip address change on restart, the Datanode details cease to > be correct for the datanode. and this prevents the cluster from functioning > after a restart. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2116) Create SCMPipelineAllocationManager as background thread for pipeline creation
[ https://issues.apache.org/jira/browse/HDDS-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932146#comment-16932146 ] Li Cheng commented on HDDS-2116: [~swagle] I think for now the role is taken by PipelinePlacementPolicy, which is instantiated with config as well. It retrieves viable datanodes from NodeManager and apply complex filter rules, which seems quite alike what AllocationManager should do. > Create SCMPipelineAllocationManager as background thread for pipeline creation > -- > > Key: HDDS-2116 > URL: https://issues.apache.org/jira/browse/HDDS-2116 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Priority: Major > > SCMAllocationManager can be leveraged to get a candidate set of datanodes > based on placement policies. And it should make the pipeline creation process > to be async and multi-thread. This should be done when we encounter with > performance bottleneck in terms of pipeline creation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2116) Create SCMAllocationManager as background thread for pipeline creation
[ https://issues.apache.org/jira/browse/HDDS-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931329#comment-16931329 ] Li Cheng commented on HDDS-2116: I agree on the multi-thread part. [~swagle] what else function do you think SCMAllocationManager should have? Or is SCMAllocationManager still needed? > Create SCMAllocationManager as background thread for pipeline creation > -- > > Key: HDDS-2116 > URL: https://issues.apache.org/jira/browse/HDDS-2116 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Priority: Major > > SCMAllocationManager can be leveraged to get a candidate set of datanodes > based on placement policies. And it should make the pipeline creation process > to be async and multi-thread. This should be done when we encounter with > performance bottleneck in terms of pipeline creation. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1569) Add ability to SCM for creating multiple pipelines with same datanode
[ https://issues.apache.org/jira/browse/HDDS-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928362#comment-16928362 ] Li Cheng commented on HDDS-1569: I separate SCMAllocationManager to another Jira https://issues.apache.org/jira/browse/HDDS-2116. I think the current BackgroundCreator has a sync way to create pipeline and we can start from there. As [~swagle] mentioned, we now don't assume many ad-hoc pipeline creation operations, the current sync call from background shall be able to handle this. Once we hit performance bottle neck for pipeline creation, we can invest to have a SCMAllocationManager with self-managed thread model. > Add ability to SCM for creating multiple pipelines with same datanode > - > > Key: HDDS-1569 > URL: https://issues.apache.org/jira/browse/HDDS-1569 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Siddharth Wagle >Assignee: Li Cheng >Priority: Major > > - Refactor _RatisPipelineProvider.create()_ to be able to create pipelines > with datanodes that are not a part of sufficient pipelines > - Define soft and hard upper bounds for pipeline membership > - Create SCMAllocationManager that can be leveraged to get a candidate set of > datanodes based on placement policies > - Add the datanodes to internal datastructures -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2116) Create SCMAllocationManager as background thread for pipeline creation
Li Cheng created HDDS-2116: -- Summary: Create SCMAllocationManager as background thread for pipeline creation Key: HDDS-2116 URL: https://issues.apache.org/jira/browse/HDDS-2116 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng SCMAllocationManager can be leveraged to get a candidate set of datanodes based on placement policies. And it should make the pipeline creation process to be async and multi-thread. This should be done when we encounter with performance bottleneck in terms of pipeline creation. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2115) Add smoke/acceptance test for Pipeline related CLI
[ https://issues.apache.org/jira/browse/HDDS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-2115: -- Assignee: Li Cheng > Add smoke/acceptance test for Pipeline related CLI > --- > > Key: HDDS-2115 > URL: https://issues.apache.org/jira/browse/HDDS-2115 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > > Currently this is no smoke/acceptance for createPipeline or listPipeline CLI. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2115) Add smoke/acceptance test for Pipeline related CLI
Li Cheng created HDDS-2115: -- Summary: Add smoke/acceptance test for Pipeline related CLI Key: HDDS-2115 URL: https://issues.apache.org/jira/browse/HDDS-2115 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng Currently this is no smoke/acceptance for createPipeline or listPipeline CLI. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1571) Create an interface for pipeline placement policy to support network topologies
[ https://issues.apache.org/jira/browse/HDDS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-1571. Fix Version/s: 0.5.0 Resolution: Fixed > Create an interface for pipeline placement policy to support network > topologies > --- > > Key: HDDS-1571 > URL: https://issues.apache.org/jira/browse/HDDS-1571 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Siddharth Wagle >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Leverage the work done in HDDS-700 for pipeline creation for open containers. > Create an interface that can provide different policy implementations for > pipeline creation. The default implementation should take into account no > topology information is configured. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1564) Ozone multi-raft support
[ https://issues.apache.org/jira/browse/HDDS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-1564: -- Assignee: Li Cheng (was: Siddharth Wagle) > Ozone multi-raft support > > > Key: HDDS-1564 > URL: https://issues.apache.org/jira/browse/HDDS-1564 > Project: Hadoop Distributed Data Store > Issue Type: New Feature > Components: Ozone Datanode, SCM >Reporter: Siddharth Wagle >Assignee: Li Cheng >Priority: Major > Attachments: Ozone Multi-Raft Support.pdf > > > Apache Ratis supports multi-raft by allowing the same node to be a part of > multiple raft groups. The proposal is to allow datanodes to be a part of > multiple raft groups. The attached design doc explains the reasons for doing > this as well a few initial design decisions. > Some of the work in this feature also related to HDDS-700 which implements > rack-aware container placement for closed containers. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2089) Add CLI createPipeline
Li Cheng created HDDS-2089: -- Summary: Add CLI createPipeline Key: HDDS-2089 URL: https://issues.apache.org/jira/browse/HDDS-2089 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: Ozone CLI Affects Versions: 0.5.0 Reporter: Li Cheng Assignee: Li Cheng Add a SCMCLI to create pipeline for ozone. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1571) Create an interface for pipeline placement policy to support network topologies
[ https://issues.apache.org/jira/browse/HDDS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923227#comment-16923227 ] Li Cheng commented on HDDS-1571: Put up a PlacementPolicy as basic interface and ScmCommonPlacementPolicy as the abstract class for both Pipeline placement related things and container placement related things. > Create an interface for pipeline placement policy to support network > topologies > --- > > Key: HDDS-1571 > URL: https://issues.apache.org/jira/browse/HDDS-1571 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Siddharth Wagle >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Leverage the work done in HDDS-700 for pipeline creation for open containers. > Create an interface that can provide different policy implementations for > pipeline creation. The default implementation should take into account no > topology information is configured. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1577) Add default pipeline placement policy implementation
[ https://issues.apache.org/jira/browse/HDDS-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-1577. Fix Version/s: 0.5.0 Resolution: Fixed > Add default pipeline placement policy implementation > > > Key: HDDS-1577 > URL: https://issues.apache.org/jira/browse/HDDS-1577 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Siddharth Wagle >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > This is a simpler implementation of the PipelinePlacementPolicy that can be > utilized if no network topology is defined for the cluster. We try to form > pipelines from existing HEALTHY datanodes randomly, as long as they satisfy > PipelinePlacementCriteria. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org