Hi Suresh,

It is something that I am considering for future phase after finishing the 
tasks related to local transport completely. In this particular case I am just 
planning to replace EC2 with a Jetstream2 VM and transport data from that VM to 
S3.

Best,
Praneeth

From: Suresh Marru <sma...@apache.org>
Reply-To: "dev@airavata.apache.org" <dev@airavata.apache.org>
Date: Monday, March 20, 2023 at 2:12 PM
To: Airavata Dev <dev@airavata.apache.org>
Subject: [External] Re: Airavata-MFT | Local transport implementation

You don't often get email from sma...@apache.org. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
This message was sent from a non-IU address. Please exercise caution when 
clicking links or opening attachments from external sources.

Hi Praneeth,

Thank you for sharing a detailed report. This is very encouraging, and KUDOs 
for a significant contribution.

In future work, you mentioned you planned to move the data from a local machine 
to Jetstream2. Do you plan to push it over SSH/SCP or Swift protocol? Including 
the SCP in your benchmarks will be great as it is a very common use case for 
Airavata users to move from a local machine to a remote HPC cluster.

Cheers,
Suresh


On Mar 17, 2023, at 12:22 AM, Chityala, Praneeth 
<pkchi...@iu.edu<mailto:pkchi...@iu.edu>> wrote:

Dear All,

This is Praneeth Chityala currently pursuing master in Computer Science at 
Indiana University Bloomington. As part of my independent study I took up MFT 
as the research area and starting understanding the architecture.

As many of you know MFT uses Agent to transfer data from one cloud storage to 
other cloud storage. These agents can be deployed on any compute machines. If 
the machine in which agent is deployed might have data files which needs to be 
uploaded to cloud storage, that’s where my involvement in the project came in. 
I worked on implementing the below extensions:

  *   Implemented the Local transport extension to allow agent to transfer data 
from its host machine given storage – Local transport 
extension<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairavata-mft%2Ftree%2Fmaster%2Ftransport%2Flocal-transport%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fairavata%2Fmft%2Ftransport%2Flocal&data=05%7C01%7Cpkchitya%40iu.edu%7C5e1cd42370e14287e50308db296e9fc1%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C638149327287872735%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=hB2%2BjKlyi82hWk%2Fb6wAoWMPYcAIy9g3XlBhT0mYuTrI%3D&reserved=0>

     *   Transport has three variations – streaming, chunked file transfer and 
chunked streaming

  *   Implemented the CLI for configuring local agent – Local agent 
CLI<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairavata-mft%2Ftree%2Fmaster%2Fpython-cli%2Fmft_cli%2Fairavata_mft_cli&data=05%7C01%7Cpkchitya%40iu.edu%7C5e1cd42370e14287e50308db296e9fc1%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C638149327287872735%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0QzO0Wqus80PRnB5JymYRltQhcO8CwZTGahjWmu7jyo%3D&reserved=0>

Performance testing results:

After successfully testing from my local machine to AWS S3 storage, I have 
deployed agent in AWS EC2 machine and performed multiple tests for compare it’s 
performance with rclone and AWS cli.
Below charts indicates the average transfer speeds from our analysis.

<image001.png>


For files from 100MB to 1GB, MFT is more than 60% faster than rclone and more 
than 150% faster than AWS cli.

Configurations of the testing:


  *   Local Machine: It’s Ubuntu EC2 VM on AWS (instance type c5.9xlarge) with 
18 cores, 10Gbps dedicated network speed and 1GBps read/write speed to disk.


  *   Cloud Storage: AWS S3 bucket in the same region as above VM.


  *   Test sets: From x-axis labels of the graph, 10m_1000 means a test set of 
1000 10MB files. All other test sets follow similar naming convention.


  *   Testing trails: Each test is run for 5 times on each transfer method.


  *   Testing presets: Before each test caching of VM is cleared so none of the 
tests get advantage of higher read speeds using page caching. This is done to 
simulate worst possible conditions while reading data.


  *   MFT configuration: I used chunked streaming with

     *   20MB as chunk size
     *   32 concurrent transfers
     *   32 concurrent chunked threads


  *   rclone configuration: After exploring many possible optimizations 
available for rclone I used following settings:

     *   --s3-chunk-size 128000
     *   --buffer-size 128000
     *   —s3-upload-cutoff 0
     *   --s3-upload-concurrency 32
     *   --multi-thread-streams 32
     *   --multi-thread-cutoff 0
     *   --s3-disable-http2
     *   --no-check-dest
     *   --transfers 32
     *   --fast-list


  *   AWS cli configuration: I used native AWS cli to transfer as it doesn’t 
have much dedicated optimizations in our findings

Observations:

  *   For local transport I used BufferedStreaming which helped MFT to get the 
max read speeds from local disk without hitting the max IOPS.

Future plans for testing:

  *   Jetstream2: Planning to replace AWS EC2 with Jetstream2 virtual machine 
and perform similar tests
  *   Emulab: Simulate same testing using Emulab VMs and custom configurations 
with help of Dimuthu.
  *   Azure: Perform local to Azure cloud storages testing with MFT, rclone and 
Azure cli
  *   GCP: Perform local to GCS testing with MFT, rclone and GCP cli
  *   I have different implementation of MFT local transport for system which 
support DMA (Direct Memory Access), we also plan to test on such systems with 
DMA, the present EC2 system doesn’t support DMA.

Further Improvements of MFT:

  *   As we noticed MFT is lagging speeds vs rclone for files less than or 
equal to 1MB, we plan to stress analyze the whole system and improve speeds for 
smaller files

Acknowledgement: I thank Dimuthu Wannipurage for clearing many doubts about MFT 
and providing guidance when needed.

Thank you and please let us know your comments or thoughts.

Best,
Praneeth Chityala

Reply via email to