Dawnpool opened a new pull request, #46582: URL: https://github.com/apache/airflow/pull/46582
This PR improves SFTP hook's `store_directory` and `retrieve_directory` functions to use a single connection when transferring a directory with multiple files. Previously, these functions relied on `store_file` and `retrieve_file` functions. And with this [PR](https://github.com/apache/airflow/pull/44247), the `store_file` and `retrieve_file` functions were modified to open and close sftp connection each time. This leads to the `store_directory` and `retrieve_directory` functions to open and close too many connections repeatedly when there are many files in a directory, which causes significant overhead. To address this, I modified them to open a connection, transfer all files in a directory, and then close the connection afterward. I also did a performance test in my local environment by transferring a directory containing 1,000 small files. This reduced the transfer time by approximately 8-9 seconds. The results are shown below. | | AS-IS | TO-BE | |--------|-----------|-----------| | store | 47.18 sec | 39.35 sec | | delete | 63.50 sec | 54.73 sec | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
