Hey Joey, I have a few concerns around creating a more variable approach to determining types. On the SDK side we'd be trying to pull hints from each route and then doing some comparison across each individual component of the hint to determine which is the most narrow. For the simplest case, where the inferred hints have direct inherited relationships, you're taking the more specific one. But once you get into types that aren't directly related or something like a composite type, how are you determining which hint is narrower? It's not too hard to imagine a scenario where nested types within two composites are of varying scope, which greatly complicates the evaluation. dict[Any, int] and dict[int, Any] are compatible hints, and I couldn't tell you which is the narrower hint. We could do some level of comparison at each field and combine the hint into dict[int,int]; however, I'm not particularly keen on the idea of overriding user-provided type hints with our own generated ones.
Which brings me to the user side of not necessarily knowing which type hints are actually being applied to each transform in their pipeline. If everything works that's great, but once type check errors start being thrown and the user cannot directly map those hints to what they provided (particularly by expanding the use of trivial inference, which is fully invisible to the user) there will be a lot more frustration. Having a consistent hierarchy of how hints get applied makes much more sense (although the function decorator approach currently overriding other methods is not super clear or ideal IMO.) I'm still an advocate for PEP 484 style hints being the best way to annotate Python code since it gives the static checks at code writing time in addition to our pipeline construction checks and is the most Pythonic approach. I'm also not a huge fan of expanding dependency on the trivial inference module, since that code depends on CPython internals that get updated every minor version release now. For the code example given you've actually changed the definition of extract_x by removing the hints and going to trivial inference. Ints are floats but floats are not ints, so the type error here is between extract_x and get_bit_length. As written, that failure is valid. I do agree that Rows and Schemas need more robust type checking, there are some issues checking between the two (particularly at xlang transform boundaries.) That's a pretty clear gap in the code base right now that would be great to close. Thanks, Jack McCluskey On Tue, Jul 15, 2025 at 7:44 PM Joey Tran <joey.t...@schrodinger.com> wrote: > Hey all, > > @Jack McCluskey <jrmcclus...@google.com>'s great talk on python type > hinting at the beam summit taught me that the "trivially" inferred types > are only used if the beam Map/FlatMap/Filter/ParDo functions aren't already > type hinted. This seems like it could be a waste of information. For > example, the following pipeline fails due to a type error: > > ``` > def extract_x(coord: Tuple[float, float]) -> float: > """Extract the x coordinate from a tuple.""" > return coord[0] > > def get_bit_length(value: int) -> int: > """Get the bit length of an integer.""" > return value.bit_length() > > with beam.Pipeline() as p: > (p > | beam.Create([(1, 2)]) > | beam.Map(extract_x) > | beam.Map(get_bit_length) > | beam.LogElements()) > ``` > > But if you take away the `extract_x` type hints (allowing the types to get > trivially inferred) it passes and the final pcollection actually has a > narrower type than before. > > Instead of just taking type hints by priority order > (@typehints.with_output_types > function type hints > trivial_inference), > would it make sense to just take the narrowest type of the three type > sources? I suspect we'd need to have some kind of phase-in period to do > something like this but it seems worth the extra type checking security. > > I think trivially inferred type hints might be particularly useful as we > introduce more schema-oriented transforms > (e.g. @with_input/output_type(beam.Row) will have very little typing > information to check with) >