HI Praneeth,

This looks like a great contribution to MFT and I appreciate your write up here.

One question: these numbers are for uploading from the local EC2 instance to 
S3, correct? Did you do any analysis on the opposite, downloading from S3 to a 
local EC2 instance?

Thanks,

Marcus

> On Mar 17, 2023, at 12:22 AM, Chityala, Praneeth <pkchi...@iu.edu> wrote:
> 
> You don't often get email from pkchi...@iu.edu. Learn why this is important
> Dear All,
>  
> This is Praneeth Chityala currently pursuing master in Computer Science at 
> Indiana University Bloomington. As part of my independent study I took up MFT 
> as the research area and starting understanding the architecture.
>  
> As many of you know MFT uses Agent to transfer data from one cloud storage to 
> other cloud storage. These agents can be deployed on any compute machines. If 
> the machine in which agent is deployed might have data files which needs to 
> be uploaded to cloud storage, that’s where my involvement in the project came 
> in. I worked on implementing the below extensions:
>       • Implemented the Local transport extension to allow agent to transfer 
> data from its host machine given storage – Local transport extension
>               • Transport has three variations – streaming, chunked file 
> transfer and chunked streaming
>       • Implemented the CLI for configuring local agent – Local agent CLI
>  
> Performance testing results:
>  
> After successfully testing from my local machine to AWS S3 storage, I have 
> deployed agent in AWS EC2 machine and performed multiple tests for compare 
> it’s performance with rclone and AWS cli.
> Below charts indicates the average transfer speeds from our analysis.
>  
> <image001.png>
>  
>  
> For files from 100MB to 1GB, MFT is more than 60% faster than rclone and more 
> than 150% faster than AWS cli.
>  
> Configurations of the testing:
>  
>       • Local Machine: It’s Ubuntu EC2 VM on AWS (instance type c5.9xlarge) 
> with 18 cores, 10Gbps dedicated network speed and 1GBps read/write speed to 
> disk.
>  
>       • Cloud Storage: AWS S3 bucket in the same region as above VM.
>  
>       • Test sets: From x-axis labels of the graph, 10m_1000 means a test set 
> of 1000 10MB files. All other test sets follow similar naming convention.
>  
>       • Testing trails: Each test is run for 5 times on each transfer method.
>  
>       • Testing presets: Before each test caching of VM is cleared so none of 
> the tests get advantage of higher read speeds using page caching. This is 
> done to simulate worst possible conditions while reading data.
>  
>       • MFT configuration: I used chunked streaming with
>               • 20MB as chunk size
>               • 32 concurrent transfers
>               • 32 concurrent chunked threads
>  
>       • rclone configuration: After exploring many possible optimizations 
> available for rclone I used following settings:
>               • --s3-chunk-size 128000
>               • --buffer-size 128000
>               • —s3-upload-cutoff 0
>               • --s3-upload-concurrency 32
>               • --multi-thread-streams 32
>               • --multi-thread-cutoff 0
>               • --s3-disable-http2
>               • --no-check-dest
>               • --transfers 32
>               • --fast-list
>  
>       • AWS cli configuration: I used native AWS cli to transfer as it 
> doesn’t have much dedicated optimizations in our findings
>  
> Observations:
>       • For local transport I used BufferedStreaming which helped MFT to get 
> the max read speeds from local disk without hitting the max IOPS.
>  
> Future plans for testing:
>       • Jetstream2: Planning to replace AWS EC2 with Jetstream2 virtual 
> machine and perform similar tests
>       • Emulab: Simulate same testing using Emulab VMs and custom 
> configurations with help of Dimuthu.
>       • Azure: Perform local to Azure cloud storages testing with MFT, rclone 
> and Azure cli
>       • GCP: Perform local to GCS testing with MFT, rclone and GCP cli
>       • I have different implementation of MFT local transport for system 
> which support DMA (Direct Memory Access), we also plan to test on such 
> systems with DMA, the present EC2 system doesn’t support DMA.
>  
> Further Improvements of MFT:
>       • As we noticed MFT is lagging speeds vs rclone for files less than or 
> equal to 1MB, we plan to stress analyze the whole system and improve speeds 
> for smaller files
>  
> Acknowledgement: I thank Dimuthu Wannipurage for clearing many doubts about 
> MFT and providing guidance when needed.
>  
> Thank you and please let us know your comments or thoughts. 
>  
> Best,
> Praneeth Chityala

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to