hjtran commented on code in PR #35601: URL: https://github.com/apache/beam/pull/35601#discussion_r2214530015
########## sdks/python/apache_beam/transforms/core.py: ########## @@ -2927,6 +2927,22 @@ class CombinePerKey(PTransformWithSideInputs): Returns: A PObject holding the result of the combine operation. """ + def __new__(cls, *args, **kwargs): + def has_side_inputs(): + return ( + any(isinstance(arg, pvalue.AsSideInput) for arg in args) or + any(isinstance(arg, pvalue.AsSideInput) for arg in kwargs.values())) + + if has_side_inputs(): + # If the CombineFn has deferred side inputs, the python SDK + # doesn't implement it. + # Use a ParDo-based CombinePerKey instead. + from apache_beam.transforms.combiners import \ + LiftedCombinePerKey + combine_fn, *args = args + return LiftedCombinePerKey(combine_fn, args, kwargs) Review Comment: I'm a bit split. I agree that users will probably not realize this subtle impact from using deferred side inputs with their combiner, but I think ideally there'd also be a way for a user to understand the risk and dismiss the warning. I'm okay with whichever though, let me know if you still think we should have the warning. I suspect that in most cases even knowing that there may be a reduction in performance that it'd be difficult to actually observe (but maybe I'm underestimating the `operations.py` combine optimizations -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org