Re: [Python] using strictest type hints

Joey Tran Wed, 16 Jul 2025 10:02:51 -0700

> My two cents: I think it's valuable for explicit type hints to
> override implicit ones for a couple of reasons:
>
> - If there's a bug in inference, at least there's a way to override that.
>
We could still allow a way to force an override e.g.
`Map(extract_z).with_output_types(int, force=True)`



> - Explicit type hints can better capture intent. E.g. in the above
> example, one might type the output to be of type float to reserve the
> freedom of doing different arithmetic operations in the future (i.e.
> it declares a contract that may be stricter than the current
> implementation). This can also be useful when types get more complex,

e.g. saying the output is a Map[int, Sequence[int]] rather than some
> monstrosity of a union based on the particular implementation.
>
This was a point I was attempting to make with `extract_x(coord: Tuple[int,
int])`. It's typed to attempt to allow use with any kind of numeric
coordinate but in a pipeline that processes integer coordinates, the
explicit type hint is too broad.


> - If one sees type hints, one can reason about that alone, rather than
> having to wonder if implicit inference will narrow them.
>
> This (and Jack's similar point) is a good point


> It would be nice to get rid of the (now almost entirely redundant)
> with_input/output_type that we needed in the Python 2 days, and huge
> +1 to improving Row hinting.
>
> I've never used with_input/output_type on a dofn, but I've actually found
it really nice on composite transforms, as it gives a very complete
overview of what the transform does, e.g.
```
@with_input_types(Tuple[float, float float])
@with_output_types(float)
class CalculateMagnitude(PTransform):
```
Without it, you need to look through `.expand` to figure out what's
expected as input/output.


> - Robert
>
> > Which brings me to the user side of not necessarily knowing which type
> hints are actually being applied to each transform in their pipeline. If
> everything works that's great, but once type check errors start being
> thrown and the user cannot directly map those hints to what they provided
> (particularly by expanding the use of trivial inference, which is fully
> invisible to the user) there will be a lot more frustration. Having a
> consistent hierarchy of how hints get applied makes much more sense
> (although the function decorator approach currently overriding other
> methods is not super clear or ideal IMO.) I'm still an advocate for PEP 484
> style hints being the best way to annotate Python code since it gives the
> static checks at code writing time in addition to our pipeline construction
> checks and is the most Pythonic approach. I'm also not a huge fan of
> expanding dependency on the trivial inference module, since that code
> depends on CPython internals that get updated every minor version release
> now.
> >
> > For the code example given you've actually changed the definition of
> extract_x by removing the hints and going to trivial inference. Ints are
> floats but floats are not ints, so the type error here is between extract_x
> and get_bit_length. As written, that failure is valid.
> >
>
+1 on it probably being a frustrating user experience to have to debug
invisible type hints

I also agree that PEP 484 style hints are the best practice and gold
standard. The example I wrote was meant to illustrate a case where the
nicely typed function might still result in type failures that are actually
not indicative of an invalid pipeline. Reproducing the pipeline:

```
>> with beam.Pipeline() as p:
>>   (p
>>    | beam.Create([(1, 2)])
>>    | beam.Map(extract_x)
>>    | beam.Map(get_bit_length)
>>    | beam.LogElements())
>> ```
```
In this pipeline, `extract_x` is written to work generically with any
number, but because it was nicely type hinted it causes this pipeline to
raise a type issue since `get_bit_length` expects ints and `extract_x`
apparently returns floats. But as trivial_inference is able to infer, it
actually returns ints if given coordinates of ints.

You can get this pipeline to work only by forcibly overriding
`beam.Map(extract_x).with_output_types(int)`.



> > I do agree that Rows and Schemas need more robust type checking, there
> are some issues checking between the two (particularly at xlang transform
> boundaries.) That's a pretty clear gap in the code base right now that
> would be great to close.
> >
>
The motivation for this blended system actually came from this gap. If we
had a blended typing system, then something like
```
def row_identity(inp_row: beam.Row) -> beam.Row:
    return inp_row

(p
| Create([Row(x=1, y=2)])
| Map(row_identity)
```
Would be able to infer the row schema automatically rather than resorting
to using the user-typed and general `beam.Row`.

That said, I do agree that the user confusion over whether their type
errors are from inferred or explicit types is a sticking point.



> > Thanks,
> >
> > Jack McCluskey
> >
> > On Tue, Jul 15, 2025 at 7:44 PM Joey Tran <joey.t...@schrodinger.com>
> wrote:
> >>
> >> Hey all,
> >>
> >> @Jack McCluskey's great talk on python type hinting at the beam summit
> taught me that the "trivially" inferred types are only used if the beam
> Map/FlatMap/Filter/ParDo functions aren't already type hinted. This seems
> like it could be a waste of information. For example, the following
> pipeline fails due to a type error:
> >>
> >> ```
> >> def extract_x(coord: Tuple[float, float]) -> float:
> >>     """Extract the x coordinate from a tuple."""
> >>     return coord[0]
> >>
> >> def get_bit_length(value: int) -> int:
> >>     """Get the bit length of an integer."""
> >>     return value.bit_length()
> >>
> >> with beam.Pipeline() as p:
> >>   (p
> >>    | beam.Create([(1, 2)])
> >>    | beam.Map(extract_x)
> >>    | beam.Map(get_bit_length)
> >>    | beam.LogElements())
> >> ```
> >>
> >> But if you take away the `extract_x` type hints (allowing the types to
> get trivially inferred) it passes and the final pcollection actually has a
> narrower type than before.
> >>
> >> Instead of just taking type hints by priority order
> (@typehints.with_output_types > function type hints > trivial_inference),
> would it make sense to just take the narrowest type of the three type
> sources? I suspect we'd need to have some kind of phase-in period to do
> something like this but it seems worth the extra type checking security.
> >>
> >> I think trivially inferred type hints might be particularly useful as
> we introduce more schema-oriented transforms (e.g.
> @with_input/output_type(beam.Row) will have very little typing information
> to check with)
>

Re: [Python] using strictest type hints

Reply via email to