I had a question today from one of our users about Beam’s Sample transform (a Combine with an internal top-like function to produce a uniform sample of size n of a PCollection). They wanted to obtain also the rest of the PCollection as an output (the non sampled elements).
My suggestion was to use the sample (since it was little) as a side input and then reprocess the collection to filter its elements, however I wonder if this is the ‘best’ solution. I was thinking also if Combine is essentially GbK + ParDo why we don’t have a Combine function with multiple outputs (maybe an evolution of CombineWithContext). I know this sounds weird and I have probably not thought much about issues or the performance of the translation but I wanted to see what others thought, does this make sense, do you see some pros/cons or other ideas. Thanks, Ismaël