Hi Praneeth,

Thank you for sharing a detailed report. This is very encouraging, and KUDOs 
for a significant contribution. 

In future work, you mentioned you planned to move the data from a local machine 
to Jetstream2. Do you plan to push it over SSH/SCP or Swift protocol? Including 
the SCP in your benchmarks will be great as it is a very common use case for 
Airavata users to move from a local machine to a remote HPC cluster. 

Cheers,
Suresh

> On Mar 17, 2023, at 12:22 AM, Chityala, Praneeth <[email protected]> wrote:
> 
> Dear All,
>  
> This is Praneeth Chityala currently pursuing master in Computer Science at 
> Indiana University Bloomington. As part of my independent study I took up MFT 
> as the research area and starting understanding the architecture.
>  
> As many of you know MFT uses Agent to transfer data from one cloud storage to 
> other cloud storage. These agents can be deployed on any compute machines. If 
> the machine in which agent is deployed might have data files which needs to 
> be uploaded to cloud storage, that’s where my involvement in the project came 
> in. I worked on implementing the below extensions:
> Implemented the Local transport extension to allow agent to transfer data 
> from its host machine given storage – Local transport extension 
> <https://github.com/apache/airavata-mft/tree/master/transport/local-transport/src/main/java/org/apache/airavata/mft/transport/local>
> Transport has three variations – streaming, chunked file transfer and chunked 
> streaming
> Implemented the CLI for configuring local agent – Local agent CLI 
> <https://github.com/apache/airavata-mft/tree/master/python-cli/mft_cli/airavata_mft_cli>
>  
> Performance testing results:
>  
> After successfully testing from my local machine to AWS S3 storage, I have 
> deployed agent in AWS EC2 machine and performed multiple tests for compare 
> it’s performance with rclone and AWS cli.
> Below charts indicates the average transfer speeds from our analysis.
>  
> <image001.png>
>  
>  
> For files from 100MB to 1GB, MFT is more than 60% faster than rclone and more 
> than 150% faster than AWS cli.
>  
> Configurations of the testing:
>  
> Local Machine: It’s Ubuntu EC2 VM on AWS (instance type c5.9xlarge) with 18 
> cores, 10Gbps dedicated network speed and 1GBps read/write speed to disk.
>  
> Cloud Storage: AWS S3 bucket in the same region as above VM.
>  
> Test sets: From x-axis labels of the graph, 10m_1000 means a test set of 1000 
> 10MB files. All other test sets follow similar naming convention.
>  
> Testing trails: Each test is run for 5 times on each transfer method.
>  
> Testing presets: Before each test caching of VM is cleared so none of the 
> tests get advantage of higher read speeds using page caching. This is done to 
> simulate worst possible conditions while reading data.
>  
> MFT configuration: I used chunked streaming with
> 20MB as chunk size
> 32 concurrent transfers
> 32 concurrent chunked threads
>  
> rclone configuration: After exploring many possible optimizations available 
> for rclone I used following settings:
> --s3-chunk-size 128000
> --buffer-size 128000
> —s3-upload-cutoff 0
> --s3-upload-concurrency 32
> --multi-thread-streams 32
> --multi-thread-cutoff 0
> --s3-disable-http2
> --no-check-dest
> --transfers 32
> --fast-list
>  
> AWS cli configuration: I used native AWS cli to transfer as it doesn’t have 
> much dedicated optimizations in our findings
>  
> Observations:
> For local transport I used BufferedStreaming which helped MFT to get the max 
> read speeds from local disk without hitting the max IOPS.
>  
> Future plans for testing:
> Jetstream2: Planning to replace AWS EC2 with Jetstream2 virtual machine and 
> perform similar tests
> Emulab: Simulate same testing using Emulab VMs and custom configurations with 
> help of Dimuthu.
> Azure: Perform local to Azure cloud storages testing with MFT, rclone and 
> Azure cli
> GCP: Perform local to GCS testing with MFT, rclone and GCP cli
> I have different implementation of MFT local transport for system which 
> support DMA (Direct Memory Access), we also plan to test on such systems with 
> DMA, the present EC2 system doesn’t support DMA.
>  
> Further Improvements of MFT:
> As we noticed MFT is lagging speeds vs rclone for files less than or equal to 
> 1MB, we plan to stress analyze the whole system and improve speeds for 
> smaller files
>  
> Acknowledgement: I thank Dimuthu Wannipurage for clearing many doubts about 
> MFT and providing guidance when needed.
>  
> Thank you and please let us know your comments or thoughts. 
>  
> Best,
> Praneeth Chityala

Reply via email to