nikie commented on pull request #13350: URL: https://github.com/apache/beam/pull/13350#issuecomment-731834985
@y1chi I have implemented your suggested changes and more (see the last commit message for more details): - auto-bucketing respects not only _id range, but also custom filter for both docs counting and the aggregation (this might feel like an overhead, but should provide more precise splits); - improved unit and integration tests. Java's `MongoDBIO` works differently: - there is a `numSplit` option which controls the number of auto buckets (10 by default) and the number of splitVector buckets if set; - does not estimate desired bundle size for auto bucketing, only for splitVector mode if `numSplit` is not provided; - does not use custom filter for auto bucketing, only filters the actual reads as per the split buckets; - does not have start/stop logic for dynamic rebalancing. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org