Ayush Goyal created HADOOP-18298: ------------------------------------ Summary: Hadoop AWS | Staging committer Multipartupload not implemented properly Key: HADOOP-18298 URL: https://issues.apache.org/jira/browse/HADOOP-18298 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 3.3.1 Reporter: Ayush Goyal
In Hadoop aws staging committer(org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter), Committer uploads files from local to s3(method- commitTaskInternal) which calls uploadFileToPendingCommit of CommitOperation to upload file using multipart upload. Multipart upload consists of three steps- 1)Initialise multipartupload. 2) Breaks the file to part and upload Parts. 3) Merge all the parts of files and finalize multipart. In the implementation of uploadFileToPendingCommit, first 2 steps are implemented. However, 3rd part is missing which leads to uploading the parts file but because it is not merged at the end of job no files are there in destination directory. S3 logs before implement 3rd steps- {code:java} 2022-05-30T13:49:31:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/part-00000-ce0a965f-622a-4950-bb4b-550470883134-c000-b552fb34-6156-4aa8-9085-679ad14fab6e.snappy.parquet?uploads 240b:c1d1:123:664f:c5d2:2:: 8.677ms ↑ 137 B ↓ 724 B 2022-05-30T13:49:31:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/part-00000-ce0a965f-622a-4950-bb4b-550470883134-c000-b552fb34-6156-4aa8-9085-679ad14fab6e.snappy.parquet?uploadId=f3beae8e-3001-48be-9bc4-306b71940e50&partNumber=1 240b:c1d1:123:664f:c5d2:2:: 443.156ms ↑ 51 KiB ↓ 325 B 2022-05-30T13:49:32:000 [200 OK] s3.ListObjectsV2 localhost:9000/minio-feature-testing/?list-type=2&delimiter=%2F&max-keys=2&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F_SUCCESS%2F&fetch-owner=false 240b:c1d1:123:664f:c5d2:2:: 3.414ms ↑ 137 B ↓ 646 B 2022-05-30T13:49:32:000 [200 OK] s3.PutObject localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/_SUCCESS 240b:c1d1:123:664f:c5d2:2:: 52.734ms ↑ 8.7 KiB ↓ 380 B 2022-05-30T13:49:32:000 [200 OK] s3.DeleteMultipleObjects localhost:9000/minio-feature-testing/?delete 240b:c1d1:123:664f:c5d2:2:: 73.954ms ↑ 350 B ↓ 432 B 2022-05-30T13:49:32:000 [404 Not Found] s3.HeadObject localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/_temporary 240b:c1d1:123:664f:c5d2:2:: 2.658ms ↑ 137 B ↓ 291 B 2022-05-30T13:49:32:000 [200 OK] s3.ListObjectsV2 localhost:9000/minio-feature-testing/?list-type=2&delimiter=%2F&max-keys=2&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F_temporary%2F&fetch-owner=false 240b:c1d1:123:664f:c5d2:2:: 4.807ms ↑ 137 B ↓ 648 B 2022-05-30T13:49:32:000 [200 OK] s3.ListMultipartUploads localhost:9000/minio-feature-testing/?uploads&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F 240b:c0e0:102:553e:b4c2:2:: 1.081ms ↑ 137 B ↓ 776 B 2022-05-30T13:49:32:000 [404 Not Found] s3.HeadObject localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/.spark-staging-ce0a965f-622a-4950-bb4b-550470883134 240b:c1d1:123:664f:c5d2:2:: 5.68ms ↑ 137 B ↓ 291 B 2022-05-30T13:49:32:000 [200 OK] s3.ListObjectsV2 localhost:9000/minio-feature-testing/?list-type=2&delimiter=%2F&max-keys=2&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F.spark-staging-ce0a965f-622a-4950-bb4b-550470883134%2F&fetch-owner=false 240b:c1d1:123:664f:c5d2:2:: 2.452ms ↑ 137 B ↓ 689 B {code} Here , After s3.PutObjectPart there is no completeMultipartupload call for 3rd step. S3 logs after implement 3rd steps- {code:java} 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads 240b:c1d1:123:664f:c5d2:2:: 9.116ms ↑ 137 B ↓ 750 B 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads 240b:c1d1:123:664f:c5d2:2:: 9.416ms ↑ 137 B ↓ 751 B 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads 240b:c1d1:123:664f:c5d2:2:: 8.506ms ↑ 137 B ↓ 751 B 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads 240b:c1d1:123:664f:c5d2:2:: 9.815ms ↑ 137 B ↓ 750 B 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D30/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads 240b:c1d1:123:664f:c5d2:2:: 10.09ms ↑ 137 B ↓ 751 B 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads 240b:c1d1:123:664f:c5d2:2:: 9.851ms ↑ 137 B ↓ 751 B 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads 240b:c1d1:123:664f:c5d2:2:: 9.006ms ↑ 137 B ↓ 750 B 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads 240b:c1d1:123:664f:c5d2:2:: 9.217ms ↑ 137 B ↓ 751 B 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=7da87f0a-f8ff-4f9c-b877-b2fdd18d3c5f&partNumber=1 240b:c1d1:123:664f:c5d2:2:: 817.474ms ↑ 52 KiB ↓ 325 B 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=782769d0-43f1-43b8-aae0-54ac4c8c6603&partNumber=1 240b:c1d1:123:664f:c5d2:2:: 818.363ms ↑ 85 KiB ↓ 325 B 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=2c509073-e2b6-4d0a-a65a-bb4f154a432c&partNumber=1 240b:c1d1:123:664f:c5d2:2:: 819.765ms ↑ 54 KiB ↓ 325 B 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=c7e09609-6193-4d41-bc05-4020291725e4&partNumber=1 240b:c1d1:123:664f:c5d2:2:: 818.782ms ↑ 55 KiB ↓ 325 B 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=3bb4278e-455a-4dc4-af01-ed3227430590&partNumber=1 240b:c1d1:123:664f:c5d2:2:: 817.97ms ↑ 51 KiB ↓ 325 B 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=8fe799e3-c712-43b7-a074-a2359232de07&partNumber=1 240b:c1d1:123:664f:c5d2:2:: 819.183ms ↑ 80 KiB ↓ 325 B 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=c2e1477b-5457-4cbe-8fdb-4e80eaca63fe&partNumber=1 240b:c1d1:123:664f:c5d2:2:: 818.126ms ↑ 53 KiB ↓ 325 B 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D30/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=992167c8-fbde-4a0d-bd4d-5ce7ddd51a87&partNumber=1 240b:c1d1:123:664f:c5d2:2:: 818.176ms ↑ 56 KiB ↓ 325 B 2022-06-17T10:56:12:000 [200 OK] s3.CompleteMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=7da87f0a-f8ff-4f9c-b877-b2fdd18d3c5f 240b:c1d1:123:664f:c5d2:2:: 632.761ms ↑ 272 B ↓ 1.1 KiB 2022-06-17T10:56:13:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads 240b:c1d1:123:664f:c5d2:2:: 6.231ms ↑ 137 B ↓ 751 B 2022-06-17T10:56:12:000 [200 OK] s3.CompleteMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=3bb4278e-455a-4dc4-af01-ed3227430590 240b:c1d1:123:664f:c5d2:2:: 697.946ms ↑ 272 B ↓ 1.1 KiB 2022-06-17T10:56:12:000 [200 OK] s3.CompleteMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=2c509073-e2b6-4d0a-a65a-bb4f154a432c 240b:c1d1:123:664f:c5d2:2:: 714.377ms ↑ 272 B ↓ 1.1 KiB {code} Needs to be implement - After uploadPart call and all upload id's are added to commitData, innerCommit should be called. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org