[GitHub] [beam] nikie edited a comment on pull request #13350: [BEAM-11266] Python IO MongoDB: add bucket_auto aggregation option for bundling in Atlas.

GitBox Sun, 22 Nov 2020 11:52:10 -0800


nikie edited a comment on pull request #13350:
URL: https://github.com/apache/beam/pull/13350#issuecomment-731834985



   @y1chi 
   I have implemented your suggested changes and more (see the last commit 
message for more details):
   - auto-bucketing respects not only _id range, but also custom filter for 
both docs counting and the aggregation (this might feel like an overhead, but 
should provide more precise splits);
   - improved unit and integration tests.
   
   Java's `MongoDBIO` works differently:
   - there is a `numSplits` option which controls the number of auto buckets 
(10 by default) and the number of splitVector buckets if set;
   - does not estimate desired bundle size for auto bucketing, only for 
splitVector mode if `numSplits` is not provided and recalculates bundle size 
based on `numSplits` if it is provided;
   - does not use custom filter for auto bucketing, only filters the actual 
reads as per the split buckets;
   - does not have start/stop logic for dynamic rebalancing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [beam] nikie edited a comment on pull request #13350: [BEAM-11266] Python IO MongoDB: add bucket_auto aggregation option for bundling in Atlas.

Reply via email to