I'd like to write a Beam PTransform that applies an *existing* Beam transform to each set of grouped values, separately, and combines the result. Is anything like this possible with Beam using the Python SDK?
Here are the closest things I've come up with: 1. If each set of *inputs* to my transform fit into memory, I could use GroupByKey followed by FlatMap. 2. If each set of *outputs* from my transform fit into memory, I could use CombinePerKey. 3. If I knew the static number of groups ahead of time, I could use Partition, followed by applying my transform multiple times, followed by Flatten. In my scenario, none of these holds true. For example, currently I have ~20 groups of values, with each group holding ~1 TB of data. My custom transform simply shuffles this TB of data around, so each set of outputs is also 1TB in size. In my particular case, it seems my options are to either relax these constraints, or to manually convert each step of my existing transform to apply per key. This conversion process is tedious, but very straightforward, e.g., the GroupByKey and ParDo that my transform is built out of just need to deal with an expanded key. I wonder, could this be something built into Beam itself, e.g,. as TransformPerKey? The ptranforms that result from combining other Beam transforms (e.g., _ChainPTransform in Python) are private, so this seems like something that would need to exist in Beam itself, if it could exist at all. Cheers, Stephan