pquentin opened a new pull request #1353: Reuse TCP connections when uploading 
files
URL: https://github.com/apache/libcloud/pull/1353
 
 
   ## Reuse TCP connections when uploading files)
   
   ### Description
   
   It's easy to break connection reuse when using the requests API: just use 
`stream=True` and never read the response. The connection used to make the 
request will never be reused, and will be dropped when the urllib3's connection 
pool is full.
   
   It turns out uploading objects using the S3 API goes through 
`prepared_request`, which incorrectly sets `stream` to the value of `raw`, 
`True` in our case. And since we don't read the response data, the connection 
are never reused, and each upload requires its own connection.
   
   This is particularly wasteful when uploading many small objects, which can 
easily happen with JSON or Parquet files generated by Apache Spark, where 
setting up the connection takes significant time compared to uploading a few 
bytes.
   
   Setting `stream=stream` in the `prepared_request` method matches the code in 
the `request` method and fixes the bug.
   
   ### Status
   
   - work in progress
   
   ### Checklist (tick everything that applies)
   
   - [x] [Code 
linting](http://libcloud.readthedocs.org/en/latest/development.html#code-style-guide)
 (required, can be done after the PR checks)
   - [x] Documentation
   - [x] [Tests](http://libcloud.readthedocs.org/en/latest/testing.html)
   - [x] 
[ICLA](http://libcloud.readthedocs.org/en/latest/development.html#contributing-bigger-changes)
 (required for bigger changes)
   
   cc @Kami @tonybaloney 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to