[ https://issues.apache.org/jira/browse/BEAM-10409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yifan Mai updated BEAM-10409: ----------------------------- Resolution: Fixed Status: Resolved (was: Open) > Add combiner packing to graph optimizer phases > ---------------------------------------------- > > Key: BEAM-10409 > URL: https://issues.apache.org/jira/browse/BEAM-10409 > Project: Beam > Issue Type: Improvement > Components: runner-core > Reporter: Yifan Mai > Assignee: Yifan Mai > Priority: P2 > Labels: stale-assigned > Time Spent: 8h > Remaining Estimate: 0h > > Some use cases of Beam (e.g. [TensorFlow > Transform|https://github.com/tensorflow/transform]) create thousands of > Combine stages with a common parent. The large number of stages can cause > performance issues on some runners. To alleviate, a graph optimization phase > could be added to the translations module that packs compatible Combine > stages into a single stage. > The graph optimization for CombinePerKey would work as follows: If > CombinePerKey stages have a common input, one input each, and one output > each, pack the stages into a single stage that runs all CombinePerKeys and > outputs resulting tuples to a new PCollection. A subsequent stage unpacks > tuples from this PCollection and sends them to the original output > PCollections. > There is an additional issue with supporting this for CombineGlobally: > because of the intermediate KeyWithVoid stage between the CombinePerKey > stages and the input stage, the CombinePerKey stages do not have a common > input stage, and cannot be packed. To support CombineGlobally, a common > sibling elimination graph optimization phase can be used to combine the > KeyWithVoid stages. After this, the CombinePerKey stages would have a common > input and can be packed. -- This message was sent by Atlassian Jira (v8.3.4#803005)