Dawnpool opened a new pull request, #46582:
URL: https://github.com/apache/airflow/pull/46582

   This PR improves SFTP hook's `store_directory` and `retrieve_directory` 
functions to use a single connection when transferring a directory with 
multiple files.
   
   Previously, these functions relied on `store_file` and `retrieve_file` 
functions. And with this [PR](https://github.com/apache/airflow/pull/44247), 
the `store_file` and `retrieve_file` functions were modified to open and close 
sftp connection each time.
   This leads to the `store_directory` and `retrieve_directory` functions to 
open and close too many connections repeatedly when there are many files in a 
directory, which causes significant overhead.
   
   To address this, I modified them to open a connection, transfer all files in 
a directory, and then close the connection afterward.
   
   I also did a performance test in my local environment by transferring a 
directory containing 1,000 small files. This reduced the transfer time by 
approximately 8-9 seconds. The results are shown below.
   
   |        | AS-IS     | TO-BE     |
   |--------|-----------|-----------|
   | store  | 47.18 sec | 39.35 sec |
   | delete | 63.50 sec | 54.73 sec |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to