[ https://issues.apache.org/jira/browse/BEAM-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507150#comment-16507150 ]
Debasish Das edited comment on BEAM-3737 at 6/9/18 8:21 PM: ------------------------------------------------------------ I saw this is being mentioned in TFMA [https://github.com/tensorflow/model-analysis/blob/master/tensorflow_model_analysis/api/impl/evaluate.py]:_AggregateCombineFn...I am not clear why BatchElements() is needed....groupByKey takes combiner which should run on both map and reduce side...Am I missing something here ? Is it the case that beam Combiner does not run on map side ? [~robertwb] is that why you mentioned that we should run the combiner upfront in ParDo and then run groupByKey to achieve map and reduce side combine ? was (Author: debasish83): I saw this is being mentioned in TFMA...I am also not clear why BatchElements() is needed....groupByKey takes combiner which should run on both map and reduce side...Am I missing something here ? Is it the case that beam Combiner does not run on map side ? [~robertwb] is that why you mentioned that we should run the combiner upfront in ParDo and then run groupByKey to achieve map and reduce side combine ? > Key-aware batching function > --------------------------- > > Key: BEAM-3737 > URL: https://issues.apache.org/jira/browse/BEAM-3737 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core > Reporter: Chuan Yu Foo > Priority: Major > > I have a CombineFn for which add_input has very large overhead. I would like > to batch the incoming elements into a large batch before each call to > add_input to reduce this overhead. In other words, I would like to do > something like: > {{elements | GroupByKey() | BatchElements() | CombineValues(MyCombineFn())}} > Unfortunately, BatchElements is not key-aware, and can't be used after a > GroupByKey to batch elements per key. I'm working around this by doing the > batching within CombineValues, which makes the CombineFn rather messy. It > would be nice if there were a key-aware BatchElements transform which could > be used in this context. -- This message was sent by Atlassian JIRA (v7.6.3#76005)