[GitHub] [arrow] StuartHadfield commented on issue #34892: [C++] Mechanism for throttling remote filesystems to avoid rate limiting

via GitHub Fri, 28 Apr 2023 03:59:58 -0700


StuartHadfield commented on issue #34892:
URL: https://github.com/apache/arrow/issues/34892#issuecomment-1527388677


   Chiming in here as I'm a pyarrow user and having immense difficulty with 
this. Food for thought:
   
   Imagine a scenario where you have nearly continuous influx of data, which 
you need to render into parquet and store on S3. A backoff strategy works fine 
and well for a single write, but when you have loads of data incoming, if you 
get rate limited, and you backoff, you risk falling behind to a point where 
it's very difficult to catch up.
   
   This is, of course, hypothetical, but it illustrates that whilst throttling 
and retry with backoff would be *very* useful for 90% of use cases (and I would 
certainly appreciate them, I just do not possess the programming skill to 
implement them here :( ), there are some niche circumstances where we may need 
to consider batching writes more efficiently.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] StuartHadfield commented on issue #34892: [C++] Mechanism for throttling remote filesystems to avoid rate limiting

Reply via email to