[GitHub] [beam] pabloem commented on pull request #25173: Fix Jdbc Write after window assigned

via GitHub Wed, 25 Jan 2023 16:26:54 -0800


pabloem commented on PR #25173:
URL: https://github.com/apache/beam/pull/25173#issuecomment-1404398659


   Autosharding is a particular capability of the Dataflow runner. See: 
https://cloud.google.com/blog/products/data-analytics/3x-dataflow-throughput-auto-sharding-bigquery
   
   It alows us to dynamically grow the number of individual shards by the 
number of streaming workers, instead of having a fixed number of shards, which 
is the case for other runners. The outcome is that 
   
   - Without autosharding: for a given key X, all of the elements for that key 
will end up processed serially (usually by the same worker)
   - With autosharding, GroupIntoBatches uses a ShardedKey where, for a key X, 
elements for that key will be batched into parallel shards that grow 
proportionally to the number of workers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] pabloem commented on pull request #25173: Fix Jdbc Write after window assigned

Reply via email to