Hi Praneeth, SCP transport is now available in MFT [1] and working fine with S3. You can update your load testing scripts to measure the performance.
[1] https://github.com/apache/airavata-mft/commit/a61fa4a34129cf0e7e807ee23461237a8a72af22 Thanks Dimuthu On Mon, Mar 20, 2023 at 3:43 PM Chityala, Praneeth <pkchi...@iu.edu> wrote: > Hi Marcus, > > Thank you. The Local plugin to download data has been implemented, but the > testing to check speeds is something not done yet. I will also add this to > my future performance testing. > > Best, > Praneeth > > On 3/20/23, 2:43 PM, "Christie, Marcus Aaron" <machr...@iu.edu <mailto: > machr...@iu.edu>> wrote: > > > HI Praneeth, > > > This looks like a great contribution to MFT and I appreciate your write up > here. > > > One question: these numbers are for uploading from the local EC2 instance > to S3, correct? Did you do any analysis on the opposite, downloading from > S3 to a local EC2 instance? > > > Thanks, > > > Marcus > > > > On Mar 17, 2023, at 12:22 AM, Chityala, Praneeth <pkchi...@iu.edu > <mailto:pkchi...@iu.edu>> wrote: > > > > You don't often get email from pkchi...@iu.edu <mailto:pkchi...@iu.edu>. > Learn why this is important > > Dear All, > > > > This is Praneeth Chityala currently pursuing master in Computer Science > at Indiana University Bloomington. As part of my independent study I took > up MFT as the research area and starting understanding the architecture. > > > > As many of you know MFT uses Agent to transfer data from one cloud > storage to other cloud storage. These agents can be deployed on any compute > machines. If the machine in which agent is deployed might have data files > which needs to be uploaded to cloud storage, that’s where my involvement in > the project came in. I worked on implementing the below extensions: > > • Implemented the Local transport extension to allow agent to transfer > data from its host machine given storage – Local transport extension > > • Transport has three variations – streaming, chunked file transfer and > chunked streaming > > • Implemented the CLI for configuring local agent – Local agent CLI > > > > Performance testing results: > > > > After successfully testing from my local machine to AWS S3 storage, I > have deployed agent in AWS EC2 machine and performed multiple tests for > compare it’s performance with rclone and AWS cli. > > Below charts indicates the average transfer speeds from our analysis. > > > > <image001.png> > > > > > > For files from 100MB to 1GB, MFT is more than 60% faster than rclone and > more than 150% faster than AWS cli. > > > > Configurations of the testing: > > > > • Local Machine: It’s Ubuntu EC2 VM on AWS (instance type c5.9xlarge) > with 18 cores, 10Gbps dedicated network speed and 1GBps read/write speed to > disk. > > > > • Cloud Storage: AWS S3 bucket in the same region as above VM. > > > > • Test sets: From x-axis labels of the graph, 10m_1000 means a test set > of 1000 10MB files. All other test sets follow similar naming convention. > > > > • Testing trails: Each test is run for 5 times on each transfer method. > > > > • Testing presets: Before each test caching of VM is cleared so none of > the tests get advantage of higher read speeds using page caching. This is > done to simulate worst possible conditions while reading data. > > > > • MFT configuration: I used chunked streaming with > > • 20MB as chunk size > > • 32 concurrent transfers > > • 32 concurrent chunked threads > > > > • rclone configuration: After exploring many possible optimizations > available for rclone I used following settings: > > • --s3-chunk-size 128000 > > • --buffer-size 128000 > > • —s3-upload-cutoff 0 > > • --s3-upload-concurrency 32 > > • --multi-thread-streams 32 > > • --multi-thread-cutoff 0 > > • --s3-disable-http2 > > • --no-check-dest > > • --transfers 32 > > • --fast-list > > > > • AWS cli configuration: I used native AWS cli to transfer as it doesn’t > have much dedicated optimizations in our findings > > > > Observations: > > • For local transport I used BufferedStreaming which helped MFT to get > the max read speeds from local disk without hitting the max IOPS. > > > > Future plans for testing: > > • Jetstream2: Planning to replace AWS EC2 with Jetstream2 virtual > machine and perform similar tests > > • Emulab: Simulate same testing using Emulab VMs and custom > configurations with help of Dimuthu. > > • Azure: Perform local to Azure cloud storages testing with MFT, rclone > and Azure cli > > • GCP: Perform local to GCS testing with MFT, rclone and GCP cli > > • I have different implementation of MFT local transport for system > which support DMA (Direct Memory Access), we also plan to test on such > systems with DMA, the present EC2 system doesn’t support DMA. > > > > Further Improvements of MFT: > > • As we noticed MFT is lagging speeds vs rclone for files less than or > equal to 1MB, we plan to stress analyze the whole system and improve speeds > for smaller files > > > > Acknowledgement: I thank Dimuthu Wannipurage for clearing many doubts > about MFT and providing guidance when needed. > > > > Thank you and please let us know your comments or thoughts. > > > > Best, > > Praneeth Chityala > > > > > >