[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972874#comment-16972874 ]
Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:21 PM: --------------------------------------------------------------------- Hi [~timmylicheng] Thanks for sharing the logs. I see an abort multipart upload request for the key plc_1570863541668_9278 once complete multipart upload failed. {code:java} 2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | op=COMPLETE_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationIn fo=[], multipartList=[partNumber: 1 5626 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085" 5627 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158" . . 5911 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258" 5912 ]} | ret=FAILURE | INVALID_PART org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: plc_1570863541668_9278 2019-11-08 20:08:24,963 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ABORT_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo= []} {code} And after that still, allocateBlock is continuing for the key because the entry from openKeyTable is not removed by abortMultipartUpload request.(Abort removed only entry which has been created during initiateMPU request, so that is the reason after some time you see the NO_SUCH_MULTIPART_UPLOAD error during commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But the strange thing I have observed is the clientID is not matching with any of the name in the partlist, as partName lastpart is clientID.) And from the OM audit log, I see partNumber 1, and a list of multipart names, not sure if some log is truncated here. As it should show like part name, partNumber. # If you can confirm for this key what are parts in OM, you can get this from listParts(But this should be done before abort request). # Check in the OM audit log for this key what is the partlist we get, not sure in the uploaded log it is truncated. On my cluster audit logs look like below, where when completeMultipartUpload, I can see partNumber and partName.(Whereas in the uploaded log, I don't see like below) {code:java} 2019-11-12 14:57:18,580 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 2019-11-12 14:57:53,967 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[]} | ret=SUCCESS | 2019-11-12 14:57:53,974 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 2019-11-12 14:57:54,154 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[blockID { containerBlockID { containerID: 6 localID: 103127415126327331 } blockCommitSequenceId: 18 } offset: 0 length: 5242880 createVersion: 0 pipeline { leaderID: "" members { uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" ipAddress: "10.65.49.251" hostName: "bh-ozone-3.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" networkLocation: "/default-rack" } members { uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" ipAddress: "10.65.51.23" hostName: "bh-ozone-4.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" networkLocation: "/default-rack" } members { uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b" ipAddress: "10.65.53.160" hostName: "bh-ozone-2.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "cf8aace1-92b8-496e-aed9-f2771c83a56b" networkLocation: "/default-rack" } state: PIPELINE_OPEN type: RATIS factor: THREE id { id: "99954bc5-a77a-4546-87b4-a45b89d6ecbf" } } ]} | ret=SUCCESS | 2019-11-12 14:57:59,811 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[]} | ret=SUCCESS | 2019-11-12 14:57:59,819 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[], clientID=103127415508860966} | ret=SUCCESS | 2019-11-12 14:58:00,016 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[blockID { containerBlockID { containerID: 4 localID: 103127415509385252 } blockCommitSequenceId: 22 } offset: 0 length: 5242880 createVersion: 0 pipeline { leaderID: "" members { uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" ipAddress: "10.65.49.251" hostName: "bh-ozone-3.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" networkLocation: "/default-rack" } members { uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" ipAddress: "10.65.51.23" hostName: "bh-ozone-4.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" networkLocation: "/default-rack" } members { uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b" ipAddress: "10.65.53.160" hostName: "bh-ozone-2.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "cf8aace1-92b8-496e-aed9-f2771c83a56b" networkLocation: "/default-rack" } state: PIPELINE_OPEN type: RATIS factor: THREE id { id: "99954bc5-a77a-4546-87b4-a45b89d6ecbf" } } ]} | ret=SUCCESS | 2019-11-12 14:58:39,710 | ERROR | OMAudit | user=root | ip=10.65.53.160 | op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key12, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[], multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581, 2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} | ret=FAILURE | MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s3dfb57b2e5f36c1f893dbc12dd66897d4bucket: b1234key: key12 at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.validateAndUpdateCache(S3MultipartUploadCompleteRequest.java:195) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:217) 2019-11-12 14:58:49,503 | ERROR | OMAudit | user=root | ip=10.65.53.160 | op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[], multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581, 2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} | ret=FAILURE | NO_SUCH_MULTIPART_UPLOAD_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s3dfb57b2e5f36c1f893dbc12dd66897d4bucket: b1234key: key123 at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.validateAndUpdateCache(S3MultipartUploadCompleteRequest.java:142) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:217) 2019-11-12 14:59:12,951 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[], multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581, 2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} | ret=SUCCESS | {code} I have tried setting up goofys no luck, I get unable to mount error look sys logs. (Still not able to know the root cause) {code:java} [root@bh-ozone-2 ozone-0.5.0-SNAPSHOT]# ./goofys --endpoint http://localhost:9878 b12345 /root/s3/ 2019/11/12 15:20:26.428553 main.FATAL Unable to mount file system, see syslog for details {code} Any help in resolving this will help. I am coming up with a freon test for S3MPU, to run the tests. >From the log I suspect, complete multipart upload request is having wrong >information that is causing for this error and once after it failed, it has a >call for abort MPU and finally, when you commit the part, it will say no such >MultipartUploadError. was (Author: bharatviswa): Hi [~timmylicheng] Thanks for sharing the logs. I see an abort multipart upload request for the key plc_1570863541668_9278 once complete multipart upload failed. {code:java} 2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | op=COMPLETE_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationIn fo=[], multipartList=[partNumber: 1 5626 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085" 5627 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158" . . 5911 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258" 5912 ]} | ret=FAILURE | INVALID_PART org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: plc_1570863541668_9278 2019-11-08 20:08:24,963 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ABORT_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo= []} {code} And after that still, allocateBlock is continuing for the key because the entry from openKeyTable is not removed by abortMultipartUpload request.(Abort removed only entry which has been created during initiateMPU request, so that is the reason after some time you see the NO_SUCH_MULTIPART_UPLOAD error during commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But the strange thing I have observed is the clientID is not matching with any of the name in the partlist, as partName lastpart is clientID.) And from the OM audit log, I see partNumber 1, and a list of multipart names, not sure if some log is truncated here. As it should show like part name, partNumber. # If you can confirm for this key what are parts in OM, you can get this from listParts(But this should be done before abort request). # Check in the OM audit log for this key what is the partlist we get, not sure in the uploaded log it is truncated. On my cluster audit logs look like below, where when completeMultipartUpload, I can see partNumber and partName.(Whereas in the uploaded log, I don't see like below) {code:java} 2019-11-12 14:57:18,580 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 2019-11-12 14:57:53,967 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[]} | ret=SUCCESS | 2019-11-12 14:57:53,974 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 2019-11-12 14:57:54,154 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[blockID { containerBlockID { containerID: 6 localID: 103127415126327331 } blockCommitSequenceId: 18 } offset: 0 length: 5242880 createVersion: 0 pipeline { leaderID: "" members { uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" ipAddress: "10.65.49.251" hostName: "bh-ozone-3.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" networkLocation: "/default-rack" } members { uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" ipAddress: "10.65.51.23" hostName: "bh-ozone-4.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" networkLocation: "/default-rack" } members { uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b" ipAddress: "10.65.53.160" hostName: "bh-ozone-2.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "cf8aace1-92b8-496e-aed9-f2771c83a56b" networkLocation: "/default-rack" } state: PIPELINE_OPEN type: RATIS factor: THREE id { id: "99954bc5-a77a-4546-87b4-a45b89d6ecbf" } } ]} | ret=SUCCESS | 2019-11-12 14:57:59,811 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[]} | ret=SUCCESS | 2019-11-12 14:57:59,819 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[], clientID=103127415508860966} | ret=SUCCESS | 2019-11-12 14:58:00,016 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[blockID { containerBlockID { containerID: 4 localID: 103127415509385252 } blockCommitSequenceId: 22 } offset: 0 length: 5242880 createVersion: 0 pipeline { leaderID: "" members { uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" ipAddress: "10.65.49.251" hostName: "bh-ozone-3.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" networkLocation: "/default-rack" } members { uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" ipAddress: "10.65.51.23" hostName: "bh-ozone-4.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" networkLocation: "/default-rack" } members { uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b" ipAddress: "10.65.53.160" hostName: "bh-ozone-2.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "cf8aace1-92b8-496e-aed9-f2771c83a56b" networkLocation: "/default-rack" } state: PIPELINE_OPEN type: RATIS factor: THREE id { id: "99954bc5-a77a-4546-87b4-a45b89d6ecbf" } } ]} | ret=SUCCESS | 2019-11-12 14:58:39,710 | ERROR | OMAudit | user=root | ip=10.65.53.160 | op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key12, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[], multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581, 2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} | ret=FAILURE | MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s3dfb57b2e5f36c1f893dbc12dd66897d4bucket: b1234key: key12 at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.validateAndUpdateCache(S3MultipartUploadCompleteRequest.java:195) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:217) 2019-11-12 14:58:49,503 | ERROR | OMAudit | user=root | ip=10.65.53.160 | op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[], multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581, 2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} | ret=FAILURE | NO_SUCH_MULTIPART_UPLOAD_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s3dfb57b2e5f36c1f893dbc12dd66897d4bucket: b1234key: key123 at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.validateAndUpdateCache(S3MultipartUploadCompleteRequest.java:142) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:217) 2019-11-12 14:59:12,951 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[], multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581, 2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} | ret=SUCCESS | {code} I have tried setting up goofys no luck, I get unable to mount error look sys logs. (Still not able to know the root cause) I am coming up with a freon test for S3MPU, to run the tests. >From the log I suspect, complete multipart upload request is having wrong >information that is causing for this error and once after it failed, it has a >call for abort MPU and finally, when you commit the part, it will say no such >MultipartUploadError. > Multipart upload report errors while writing to ozone Ratis pipeline > -------------------------------------------------------------------- > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager > Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM > Reporter: Li Cheng > Assignee: Bharat Viswanadham > Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, > image-2019-10-31-18-56-56-177.png, om_audit_log_plc_1570863541668_9278.txt > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) > at > org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) > at > org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) > at > org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) > at > org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) > > The following errors has been resolved in > https://issues.apache.org/jira/browse/HDDS-2322. > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org