> My two cents: I think it's valuable for explicit type hints to > override implicit ones for a couple of reasons: > > - If there's a bug in inference, at least there's a way to override that. > We could still allow a way to force an override e.g. `Map(extract_z).with_output_types(int, force=True)`
> - Explicit type hints can better capture intent. E.g. in the above > example, one might type the output to be of type float to reserve the > freedom of doing different arithmetic operations in the future (i.e. > it declares a contract that may be stricter than the current > implementation). This can also be useful when types get more complex, e.g. saying the output is a Map[int, Sequence[int]] rather than some > monstrosity of a union based on the particular implementation. > This was a point I was attempting to make with `extract_x(coord: Tuple[int, int])`. It's typed to attempt to allow use with any kind of numeric coordinate but in a pipeline that processes integer coordinates, the explicit type hint is too broad. > - If one sees type hints, one can reason about that alone, rather than > having to wonder if implicit inference will narrow them. > > This (and Jack's similar point) is a good point > It would be nice to get rid of the (now almost entirely redundant) > with_input/output_type that we needed in the Python 2 days, and huge > +1 to improving Row hinting. > > I've never used with_input/output_type on a dofn, but I've actually found it really nice on composite transforms, as it gives a very complete overview of what the transform does, e.g. ``` @with_input_types(Tuple[float, float float]) @with_output_types(float) class CalculateMagnitude(PTransform): ``` Without it, you need to look through `.expand` to figure out what's expected as input/output. > - Robert > > > Which brings me to the user side of not necessarily knowing which type > hints are actually being applied to each transform in their pipeline. If > everything works that's great, but once type check errors start being > thrown and the user cannot directly map those hints to what they provided > (particularly by expanding the use of trivial inference, which is fully > invisible to the user) there will be a lot more frustration. Having a > consistent hierarchy of how hints get applied makes much more sense > (although the function decorator approach currently overriding other > methods is not super clear or ideal IMO.) I'm still an advocate for PEP 484 > style hints being the best way to annotate Python code since it gives the > static checks at code writing time in addition to our pipeline construction > checks and is the most Pythonic approach. I'm also not a huge fan of > expanding dependency on the trivial inference module, since that code > depends on CPython internals that get updated every minor version release > now. > > > > For the code example given you've actually changed the definition of > extract_x by removing the hints and going to trivial inference. Ints are > floats but floats are not ints, so the type error here is between extract_x > and get_bit_length. As written, that failure is valid. > > > +1 on it probably being a frustrating user experience to have to debug invisible type hints I also agree that PEP 484 style hints are the best practice and gold standard. The example I wrote was meant to illustrate a case where the nicely typed function might still result in type failures that are actually not indicative of an invalid pipeline. Reproducing the pipeline: ``` >> with beam.Pipeline() as p: >> (p >> | beam.Create([(1, 2)]) >> | beam.Map(extract_x) >> | beam.Map(get_bit_length) >> | beam.LogElements()) >> ``` ``` In this pipeline, `extract_x` is written to work generically with any number, but because it was nicely type hinted it causes this pipeline to raise a type issue since `get_bit_length` expects ints and `extract_x` apparently returns floats. But as trivial_inference is able to infer, it actually returns ints if given coordinates of ints. You can get this pipeline to work only by forcibly overriding `beam.Map(extract_x).with_output_types(int)`. > > I do agree that Rows and Schemas need more robust type checking, there > are some issues checking between the two (particularly at xlang transform > boundaries.) That's a pretty clear gap in the code base right now that > would be great to close. > > > The motivation for this blended system actually came from this gap. If we had a blended typing system, then something like ``` def row_identity(inp_row: beam.Row) -> beam.Row: return inp_row (p | Create([Row(x=1, y=2)]) | Map(row_identity) ``` Would be able to infer the row schema automatically rather than resorting to using the user-typed and general `beam.Row`. That said, I do agree that the user confusion over whether their type errors are from inferred or explicit types is a sticking point. > > Thanks, > > > > Jack McCluskey > > > > On Tue, Jul 15, 2025 at 7:44 PM Joey Tran <joey.t...@schrodinger.com> > wrote: > >> > >> Hey all, > >> > >> @Jack McCluskey's great talk on python type hinting at the beam summit > taught me that the "trivially" inferred types are only used if the beam > Map/FlatMap/Filter/ParDo functions aren't already type hinted. This seems > like it could be a waste of information. For example, the following > pipeline fails due to a type error: > >> > >> ``` > >> def extract_x(coord: Tuple[float, float]) -> float: > >> """Extract the x coordinate from a tuple.""" > >> return coord[0] > >> > >> def get_bit_length(value: int) -> int: > >> """Get the bit length of an integer.""" > >> return value.bit_length() > >> > >> with beam.Pipeline() as p: > >> (p > >> | beam.Create([(1, 2)]) > >> | beam.Map(extract_x) > >> | beam.Map(get_bit_length) > >> | beam.LogElements()) > >> ``` > >> > >> But if you take away the `extract_x` type hints (allowing the types to > get trivially inferred) it passes and the final pcollection actually has a > narrower type than before. > >> > >> Instead of just taking type hints by priority order > (@typehints.with_output_types > function type hints > trivial_inference), > would it make sense to just take the narrowest type of the three type > sources? I suspect we'd need to have some kind of phase-in period to do > something like this but it seems worth the extra type checking security. > >> > >> I think trivially inferred type hints might be particularly useful as > we introduce more schema-oriented transforms (e.g. > @with_input/output_type(beam.Row) will have very little typing information > to check with) >