ReemaAlzaid opened a new pull request, #12305: URL: https://github.com/apache/gluten/pull/12305
## What changes were proposed in this pull request? This PR adds Gluten-side support for multi-threaded asynchronous multipart upload for S3 compatible object storage in the Velox backend. When enabled, Gluten uses an async S3 write path that uploads multipart upload parts through a shared upload thread pool. The number of in-flight part uploads per file and the upload thread pool size are configurable. The existing synchronous S3 write path remains the default behavior. This PR also adds a benchmark for comparing synchronous and asynchronous S3 multipart upload performance. https://github.com/apache/gluten/issues/12303 ## How was this patch tested? Built the Velox backend with S3 enabled and ran the S3 async upload benchmark against MinIO. Example benchmark command: ```bash export AWS_ACCESS_KEY_ID=minioadmin export AWS_SECRET_ACCESS_KEY=minioadmin export GLUTEN_S3_BENCH_BUCKET=writedata export GLUTEN_S3_BENCH_ENDPOINT=http://127.0.0.1:9000 export GLUTEN_S3_BENCH_REGION=us-east-1 export GLUTEN_S3_BENCH_PATH_STYLE_ACCESS=true export GLUTEN_S3_BENCH_SSL_ENABLED=false export GLUTEN_S3_BENCH_MAX_CONCURRENCY=8 export GLUTEN_S3_BENCH_UPLOAD_THREADS=8 export GLUTEN_S3_BENCH_MIN_PART_SIZE=32MB ./cpp/build/velox/benchmarks/s3_async_upload_benchmark \ --bm_min_iters=1 \ --bm_regex='(sync|async)_upload_(16|32|64|128|256|512|1024)M' ============================================================================ [...]/benchmarks/S3AsyncUploadBenchmark.cc relative time/iter iters/s ============================================================================ sync_upload_16M 80.58ms 12.41 async_upload_16M 128.28% 62.81ms 15.92 sync_upload_32M 117.05ms 8.54 async_upload_32M 93.949% 124.59ms 8.03 sync_upload_64M 204.63ms 4.89 async_upload_64M 132.00% 155.03ms 6.45 sync_upload_128M 360.72ms 2.77 async_upload_128M 135.32% 266.56ms 3.75 sync_upload_256M 709.47ms 1.41 async_upload_256M 203.33% 348.93ms 2.87 sync_upload_512M 1.50s 667.72m async_upload_512M 237.77% 629.87ms 1.59 sync_upload_1024M 2.90s 345.41m async_upload_1024M 313.82% 922.55ms 1.08 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
