pabloem commented on PR #25173:
URL: https://github.com/apache/beam/pull/25173#issuecomment-1404398659

   Autosharding is a particular capability of the Dataflow runner. See: 
https://cloud.google.com/blog/products/data-analytics/3x-dataflow-throughput-auto-sharding-bigquery
   
   It alows us to dynamically grow the number of individual shards by the 
number of streaming workers, instead of having a fixed number of shards, which 
is the case for other runners. The outcome is that 
   
   - Without autosharding: for a given key X, all of the elements for that key 
will end up processed serially (usually by the same worker)
   - With autosharding, GroupIntoBatches uses a ShardedKey where, for a key X, 
elements for that key will be batched into parallel shards that grow 
proportionally to the number of workers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to