[ https://issues.apache.org/jira/browse/BEAM-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387286#comment-16387286 ]
Kenneth Knowles commented on BEAM-3737: --------------------------------------- Actually I may have misinterpreted this, but I think [~robertwb] has context for this. Unassigning for now. > Key-aware batching function > --------------------------- > > Key: BEAM-3737 > URL: https://issues.apache.org/jira/browse/BEAM-3737 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core > Reporter: Chuan Yu Foo > Priority: Major > > I have a CombineFn for which add_input has very large overhead. I would like > to batch the incoming elements into a large batch before each call to > add_input to reduce this overhead. In other words, I would like to do > something like: > {{elements | GroupByKey() | BatchElements() | CombineValues(MyCombineFn())}} > Unfortunately, BatchElements is not key-aware, and can't be used after a > GroupByKey to batch elements per key. I'm working around this by doing the > batching within CombineValues, which makes the CombineFn rather messy. It > would be nice if there were a key-aware BatchElements transform which could > be used in this context. -- This message was sent by Atlassian JIRA (v7.6.3#76005)