[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972874#comment-16972874
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:22 PM:
---------------------------------------------------------------------

Hi [~timmylicheng]

Thanks for sharing the logs.

I see an abort multipart upload request for the key plc_1570863541668_9278 once 
complete multipart upload failed.

 

 
{code:java}
2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | 
op=COMPLETE_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationIn       fo=[], multipartList=[partNumber: 1  
 5626 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"
   5627 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"
     . .   5911 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"
  5912 ]} | ret=FAILURE | INVALID_PART 
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload 
Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: 
plc_1570863541668_9278
2019-11-08 20:08:24,963 | INFO  | OMAudit | user=root | ip=9.134.50.210 | 
op=ABORT_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=       []} 
{code}
 

And after that still, allocateBlock is continuing for the key because the entry 
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed 
only entry which has been created during initiateMPU request, so that is the 
reason after some time you see the  NO_SUCH_MULTIPART_UPLOAD error during 
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But 
the strange thing I have observed is the clientID is not matching with any of 
the name in the partlist, as partName lastpart is clientID.)

 

And from the OM audit log, I see partNumber 1, and a list of multipart names, 
not sure if some log is truncated here. As it should show like part name, 
partNumber.
 # If you can confirm for this key what are parts in OM, you can get this from 
listParts(But this should be done before abort request).
 # Check in the OM audit log for this key what is the partlist we get, not sure 
in the uploaded log it is truncated. 

 

On my cluster audit logs look like below, where when completeMultipartUpload, I 
can see partNumber and partName.(Whereas in the uploaded log, I don't see like 
below)

 

 
{code:java}
2019-11-12 14:57:18,580 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,967 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,974 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 
2019-11-12 14:57:54,154 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 6
    localID: 103127415126327331
  }
  blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    networkLocation: "/default-rack"
  }
  state: PIPELINE_OPEN
  type: RATIS
  factor: THREE
  id {
    id: "99954bc5-a77a-4546-87b4-a45b89d6ecbf"
  }
}
]} | ret=SUCCESS | 
2019-11-12 14:57:59,811 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:59,819 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415508860966} | ret=SUCCESS | 
2019-11-12 14:58:00,016 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 4
    localID: 103127415509385252
  }
  blockCommitSequenceId: 22
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    networkLocation: "/default-rack"
  }
  state: PIPELINE_OPEN
  type: RATIS
  factor: THREE
  id {
    id: "99954bc5-a77a-4546-87b4-a45b89d6ecbf"
  }
}
]} | ret=SUCCESS | 
2019-11-12 14:58:39,710 | ERROR | OMAudit | user=root | ip=10.65.53.160 | 
op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key12, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[], 
multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581,
 2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} | 
ret=FAILURE | MISMATCH_MULTIPART_LIST 
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload 
Failed: volume: s3dfb57b2e5f36c1f893dbc12dd66897d4bucket: b1234key: key12
 at 
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.validateAndUpdateCache(S3MultipartUploadCompleteRequest.java:195)
 at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:217)
 
2019-11-12 14:58:49,503 | ERROR | OMAudit | user=root | ip=10.65.53.160 | 
op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[], 
multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581,
 2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} | 
ret=FAILURE | NO_SUCH_MULTIPART_UPLOAD_ERROR 
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload 
Failed: volume: s3dfb57b2e5f36c1f893dbc12dd66897d4bucket: b1234key: key123
 at 
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.validateAndUpdateCache(S3MultipartUploadCompleteRequest.java:142)
 at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:217)
 
2019-11-12 14:59:12,951 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[], 
multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581,
 2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} | 
ret=SUCCESS | 
{code}
 

 

I have tried setting up goofys no luck, I get unable to mount error look sys 
logs. (Still not able to know the root cause)

 

 
{code:java}
[root@bh-ozone-2 ozone-0.5.0-SNAPSHOT]# ./goofys --endpoint 
http://localhost:9878 b12345 /root/s3/
2019/11/12 15:20:26.428553 main.FATAL Unable to mount file system, see syslog 
for details
{code}
 

Any information in resolving this will greatly help. 

 

I am coming up with a freon test for S3MPU, to run the tests. 

 

>From the log I suspect, complete multipart upload request is having wrong 
>information that is causing for this error and once after it failed, it has a 
>call for abort MPU and finally, when you commit the part, it will say no such 
>MultipartUploadError.

 

 

 


was (Author: bharatviswa):
Hi [~timmylicheng]

Thanks for sharing the logs.

I see an abort multipart upload request for the key plc_1570863541668_9278 once 
complete multipart upload failed.

 

 
{code:java}
2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | 
op=COMPLETE_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationIn       fo=[], multipartList=[partNumber: 1  
 5626 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"
   5627 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"
     . .   5911 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"
  5912 ]} | ret=FAILURE | INVALID_PART 
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload 
Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: 
plc_1570863541668_9278
2019-11-08 20:08:24,963 | INFO  | OMAudit | user=root | ip=9.134.50.210 | 
op=ABORT_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=       []} 
{code}
 

And after that still, allocateBlock is continuing for the key because the entry 
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed 
only entry which has been created during initiateMPU request, so that is the 
reason after some time you see the  NO_SUCH_MULTIPART_UPLOAD error during 
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But 
the strange thing I have observed is the clientID is not matching with any of 
the name in the partlist, as partName lastpart is clientID.)

 

And from the OM audit log, I see partNumber 1, and a list of multipart names, 
not sure if some log is truncated here. As it should show like part name, 
partNumber.
 # If you can confirm for this key what are parts in OM, you can get this from 
listParts(But this should be done before abort request).
 # Check in the OM audit log for this key what is the partlist we get, not sure 
in the uploaded log it is truncated. 

 

On my cluster audit logs look like below, where when completeMultipartUpload, I 
can see partNumber and partName.(Whereas in the uploaded log, I don't see like 
below)

 

 
{code:java}
2019-11-12 14:57:18,580 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,967 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,974 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 
2019-11-12 14:57:54,154 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 6
    localID: 103127415126327331
  }
  blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    networkLocation: "/default-rack"
  }
  state: PIPELINE_OPEN
  type: RATIS
  factor: THREE
  id {
    id: "99954bc5-a77a-4546-87b4-a45b89d6ecbf"
  }
}
]} | ret=SUCCESS | 
2019-11-12 14:57:59,811 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:59,819 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415508860966} | ret=SUCCESS | 
2019-11-12 14:58:00,016 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 4
    localID: 103127415509385252
  }
  blockCommitSequenceId: 22
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    networkLocation: "/default-rack"
  }
  state: PIPELINE_OPEN
  type: RATIS
  factor: THREE
  id {
    id: "99954bc5-a77a-4546-87b4-a45b89d6ecbf"
  }
}
]} | ret=SUCCESS | 
2019-11-12 14:58:39,710 | ERROR | OMAudit | user=root | ip=10.65.53.160 | 
op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key12, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[], 
multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581,
 2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} | 
ret=FAILURE | MISMATCH_MULTIPART_LIST 
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload 
Failed: volume: s3dfb57b2e5f36c1f893dbc12dd66897d4bucket: b1234key: key12
 at 
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.validateAndUpdateCache(S3MultipartUploadCompleteRequest.java:195)
 at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:217)
 
2019-11-12 14:58:49,503 | ERROR | OMAudit | user=root | ip=10.65.53.160 | 
op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[], 
multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581,
 2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} | 
ret=FAILURE | NO_SUCH_MULTIPART_UPLOAD_ERROR 
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload 
Failed: volume: s3dfb57b2e5f36c1f893dbc12dd66897d4bucket: b1234key: key123
 at 
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.validateAndUpdateCache(S3MultipartUploadCompleteRequest.java:142)
 at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:217)
 
2019-11-12 14:59:12,951 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[], 
multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581,
 2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} | 
ret=SUCCESS | 
{code}
 

 

I have tried setting up goofys no luck, I get unable to mount error look sys 
logs. (Still not able to know the root cause)

 

 
{code:java}
[root@bh-ozone-2 ozone-0.5.0-SNAPSHOT]# ./goofys --endpoint 
http://localhost:9878 b12345 /root/s3/
2019/11/12 15:20:26.428553 main.FATAL Unable to mount file system, see syslog 
for details
{code}
 

Any help in resolving this will help. 

 

I am coming up with a freon test for S3MPU, to run the tests. 

 

>From the log I suspect, complete multipart upload request is having wrong 
>information that is causing for this error and once after it failed, it has a 
>call for abort MPU and finally, when you commit the part, it will say no such 
>MultipartUploadError.

 

 

 

> Multipart upload report errors while writing to ozone Ratis pipeline
> --------------------------------------------------------------------
>
>                 Key: HDDS-2356
>                 URL: https://issues.apache.org/jira/browse/HDDS-2356
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Manager
>    Affects Versions: 0.4.1
>         Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>            Reporter: Li Cheng
>            Assignee: Bharat Viswanadham
>            Priority: Blocker
>             Fix For: 0.5.0
>
>         Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, 
> image-2019-10-31-18-56-56-177.png, om_audit_log_plc_1570863541668_9278.txt
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> Updated on 11/06/2019:
> See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs 
> are in the attachment.
>  2019-11-05 18:12:37,766 ERROR 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest:
>  MultipartUpload Commit is failed for Key:./2
> 0191012/plc_1570863541668_9278 in Volume/Bucket 
> s325d55ad283aa400af464c76d713c07ad/ozone-test
> NO_SUCH_MULTIPART_UPLOAD_ERROR 
> org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload 
> is with specified uploadId fcda8608-b431-48b7-8386-
> 0a332f1a709a-103084683261641950
> at 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1
> 56)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.
> java:217)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132)
> at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100)
> at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>  
> Updated on 10/28/2019:
> See MISMATCH_MULTIPART_LIST error.
>  
> 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete 
> Multipart Upload Request for bucket: ozone-test, key: 
> 20191012/plc_1570863541668_927
>  8
>  MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: 
> Complete Multipart Upload Failed: volume: 
> s3c89e813c80ffcea9543004d57b2a1239bucket:
>  ozone-testkey: 20191012/plc_1570863541668_9278
>  at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732)
>  at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB
>  .java:1104)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:497)
>  at 
> org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66)
>  at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source)
>  at 
> org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883)
>  at 
> org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445)
>  at 
> org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:497)
>  at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148)
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191)
>  at 
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200)
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103)
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493)
>  
> The following errors has been resolved in 
> https://issues.apache.org/jira/browse/HDDS-2322. 
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
>  java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
>  2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to