Ayush Goyal created HADOOP-18298:
------------------------------------

             Summary: Hadoop AWS | Staging committer Multipartupload not 
implemented properly
                 Key: HADOOP-18298
                 URL: https://issues.apache.org/jira/browse/HADOOP-18298
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs/s3
    Affects Versions: 3.3.1
            Reporter: Ayush Goyal


In Hadoop aws staging 
committer(org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter), Committer 
uploads files from local to s3(method- commitTaskInternal) which calls 
uploadFileToPendingCommit of CommitOperation to upload file using multipart 
upload.

 

Multipart upload consists of three steps-

1)Initialise multipartupload.

2) Breaks the file to part and upload Parts.

3) Merge all the parts of files and finalize multipart.

 

In the implementation of uploadFileToPendingCommit, first 2 steps are 
implemented. However, 3rd part is missing which leads to uploading the parts 
file but because it is not merged at the end of job no files are there in 
destination directory.

 

S3 logs before implement 3rd steps-

 
{code:java}
2022-05-30T13:49:31:000 [200 OK] s3.NewMultipartUpload 
localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/part-00000-ce0a965f-622a-4950-bb4b-550470883134-c000-b552fb34-6156-4aa8-9085-679ad14fab6e.snappy.parquet?uploads
  240b:c1d1:123:664f:c5d2:2::               8.677ms      ↑ 137 B ↓ 724 B
2022-05-30T13:49:31:000 [200 OK] s3.PutObjectPart 
localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/part-00000-ce0a965f-622a-4950-bb4b-550470883134-c000-b552fb34-6156-4aa8-9085-679ad14fab6e.snappy.parquet?uploadId=f3beae8e-3001-48be-9bc4-306b71940e50&partNumber=1
  240b:c1d1:123:664f:c5d2:2::                443.156ms    ↑ 51 KiB ↓ 325 B
2022-05-30T13:49:32:000 [200 OK] s3.ListObjectsV2 
localhost:9000/minio-feature-testing/?list-type=2&delimiter=%2F&max-keys=2&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F_SUCCESS%2F&fetch-owner=false
  240b:c1d1:123:664f:c5d2:2::                3.414ms      ↑ 137 B ↓ 646 B
2022-05-30T13:49:32:000 [200 OK] s3.PutObject 
localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/_SUCCESS
 240b:c1d1:123:664f:c5d2:2::                52.734ms     ↑ 8.7 KiB ↓ 380 B
2022-05-30T13:49:32:000 [200 OK] s3.DeleteMultipleObjects 
localhost:9000/minio-feature-testing/?delete  240b:c1d1:123:664f:c5d2:2::       
         73.954ms     ↑ 350 B ↓ 432 B
2022-05-30T13:49:32:000 [404 Not Found] s3.HeadObject 
localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/_temporary
 240b:c1d1:123:664f:c5d2:2::                2.658ms      ↑ 137 B ↓ 291 B
2022-05-30T13:49:32:000 [200 OK] s3.ListObjectsV2 
localhost:9000/minio-feature-testing/?list-type=2&delimiter=%2F&max-keys=2&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F_temporary%2F&fetch-owner=false
  240b:c1d1:123:664f:c5d2:2::                 4.807ms      ↑ 137 B ↓ 648 B
2022-05-30T13:49:32:000 [200 OK] s3.ListMultipartUploads 
localhost:9000/minio-feature-testing/?uploads&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F
  240b:c0e0:102:553e:b4c2:2::               1.081ms      ↑ 137 B ↓ 776 B
2022-05-30T13:49:32:000 [404 Not Found] s3.HeadObject 
localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/.spark-staging-ce0a965f-622a-4950-bb4b-550470883134
 240b:c1d1:123:664f:c5d2:2::                 5.68ms       ↑ 137 B ↓ 291 B
2022-05-30T13:49:32:000 [200 OK] s3.ListObjectsV2 
localhost:9000/minio-feature-testing/?list-type=2&delimiter=%2F&max-keys=2&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F.spark-staging-ce0a965f-622a-4950-bb4b-550470883134%2F&fetch-owner=false
  240b:c1d1:123:664f:c5d2:2::              2.452ms      ↑ 137 B ↓ 689 B
  {code}
Here , After s3.PutObjectPart there is no completeMultipartupload call for 3rd 
step.

 

S3 logs after implement 3rd steps-

 
{code:java}
2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
  240b:c1d1:123:664f:c5d2:2::               9.116ms      ↑ 137 B ↓ 750 B
2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
  240b:c1d1:123:664f:c5d2:2::               9.416ms      ↑ 137 B ↓ 751 B
2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
  240b:c1d1:123:664f:c5d2:2::               8.506ms      ↑ 137 B ↓ 751 B
2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
  240b:c1d1:123:664f:c5d2:2::               9.815ms      ↑ 137 B ↓ 750 B
2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D30/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
  240b:c1d1:123:664f:c5d2:2::               10.09ms      ↑ 137 B ↓ 751 B
2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
  240b:c1d1:123:664f:c5d2:2::               9.851ms      ↑ 137 B ↓ 751 B
2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
  240b:c1d1:123:664f:c5d2:2::               9.006ms      ↑ 137 B ↓ 750 B
2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
  240b:c1d1:123:664f:c5d2:2::               9.217ms      ↑ 137 B ↓ 751 B
2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=7da87f0a-f8ff-4f9c-b877-b2fdd18d3c5f&partNumber=1
  240b:c1d1:123:664f:c5d2:2::               817.474ms    ↑ 52 KiB ↓ 325 B
2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=782769d0-43f1-43b8-aae0-54ac4c8c6603&partNumber=1
  240b:c1d1:123:664f:c5d2:2::               818.363ms    ↑ 85 KiB ↓ 325 B
2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=2c509073-e2b6-4d0a-a65a-bb4f154a432c&partNumber=1
  240b:c1d1:123:664f:c5d2:2::               819.765ms    ↑ 54 KiB ↓ 325 B
2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=c7e09609-6193-4d41-bc05-4020291725e4&partNumber=1
  240b:c1d1:123:664f:c5d2:2::               818.782ms    ↑ 55 KiB ↓ 325 B
2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=3bb4278e-455a-4dc4-af01-ed3227430590&partNumber=1
  240b:c1d1:123:664f:c5d2:2::               817.97ms     ↑ 51 KiB ↓ 325 B
2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=8fe799e3-c712-43b7-a074-a2359232de07&partNumber=1
  240b:c1d1:123:664f:c5d2:2::               819.183ms    ↑ 80 KiB ↓ 325 B
2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=c2e1477b-5457-4cbe-8fdb-4e80eaca63fe&partNumber=1
  240b:c1d1:123:664f:c5d2:2::               818.126ms    ↑ 53 KiB ↓ 325 B
2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D30/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=992167c8-fbde-4a0d-bd4d-5ce7ddd51a87&partNumber=1
  240b:c1d1:123:664f:c5d2:2::               818.176ms    ↑ 56 KiB ↓ 325 B
2022-06-17T10:56:12:000 [200 OK] s3.CompleteMultipartUpload 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=7da87f0a-f8ff-4f9c-b877-b2fdd18d3c5f
  240b:c1d1:123:664f:c5d2:2::               632.761ms    ↑ 272 B ↓ 1.1 KiB
2022-06-17T10:56:13:000 [200 OK] s3.NewMultipartUpload 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
  240b:c1d1:123:664f:c5d2:2::               6.231ms      ↑ 137 B ↓ 751 B
2022-06-17T10:56:12:000 [200 OK] s3.CompleteMultipartUpload 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=3bb4278e-455a-4dc4-af01-ed3227430590
  240b:c1d1:123:664f:c5d2:2::               697.946ms    ↑ 272 B ↓ 1.1 KiB
2022-06-17T10:56:12:000 [200 OK] s3.CompleteMultipartUpload 
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=2c509073-e2b6-4d0a-a65a-bb4f154a432c
  240b:c1d1:123:664f:c5d2:2::               714.377ms    ↑ 272 B ↓ 1.1 KiB
 {code}
 

 

Needs to be implement -

 

After uploadPart call and all upload id's are added to commitData, innerCommit 
should be called.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to