Hey Joey,

I have a few concerns around creating a more variable approach to
determining types. On the SDK side we'd be trying to pull hints from each
route and then doing some comparison across each individual component of
the hint to determine which is the most narrow. For the simplest case,
where the inferred hints have direct inherited relationships, you're taking
the more specific one. But once you get into types that aren't directly
related or something like a composite type, how are you determining which
hint is narrower? It's not too hard to imagine a scenario where nested
types within two composites are of varying scope, which greatly complicates
the evaluation. dict[Any, int] and dict[int, Any] are compatible hints, and
I couldn't tell you which is the narrower hint. We could do some level of
comparison at each field and combine the hint into dict[int,int]; however,
I'm not particularly keen on the idea of overriding user-provided type
hints with our own generated ones.

Which brings me to the user side of not necessarily knowing which type
hints are actually being applied to each transform in their pipeline. If
everything works that's great, but once type check errors start being
thrown and the user cannot directly map those hints to what they provided
(particularly by expanding the use of trivial inference, which is fully
invisible to the user) there will be a lot more frustration. Having a
consistent hierarchy of how hints get applied makes much more sense
(although the function decorator approach currently overriding other
methods is not super clear or ideal IMO.) I'm still an advocate for PEP 484
style hints being the best way to annotate Python code since it gives the
static checks at code writing time in addition to our pipeline construction
checks and is the most Pythonic approach. I'm also not a huge fan of
expanding dependency on the trivial inference module, since that code
depends on CPython internals that get updated every minor version release
now.

For the code example given you've actually changed the definition of
extract_x by removing the hints and going to trivial inference. Ints are
floats but floats are not ints, so the type error here is between extract_x
and get_bit_length. As written, that failure is valid.

I do agree that Rows and Schemas need more robust type checking, there are
some issues checking between the two (particularly at xlang transform
boundaries.) That's a pretty clear gap in the code base right now that
would be great to close.

Thanks,

Jack McCluskey

On Tue, Jul 15, 2025 at 7:44 PM Joey Tran <joey.t...@schrodinger.com> wrote:

> Hey all,
>
> @Jack McCluskey <jrmcclus...@google.com>'s great talk on python type
> hinting at the beam summit taught me that the "trivially" inferred types
> are only used if the beam Map/FlatMap/Filter/ParDo functions aren't already
> type hinted. This seems like it could be a waste of information. For
> example, the following pipeline fails due to a type error:
>
> ```
> def extract_x(coord: Tuple[float, float]) -> float:
>     """Extract the x coordinate from a tuple."""
>     return coord[0]
>
> def get_bit_length(value: int) -> int:
>     """Get the bit length of an integer."""
>     return value.bit_length()
>
> with beam.Pipeline() as p:
>   (p
>    | beam.Create([(1, 2)])
>    | beam.Map(extract_x)
>    | beam.Map(get_bit_length)
>    | beam.LogElements())
> ```
>
> But if you take away the `extract_x` type hints (allowing the types to get
> trivially inferred) it passes and the final pcollection actually has a
> narrower type than before.
>
> Instead of just taking type hints by priority order
> (@typehints.with_output_types > function type hints > trivial_inference),
> would it make sense to just take the narrowest type of the three type
> sources? I suspect we'd need to have some kind of phase-in period to do
> something like this but it seems worth the extra type checking security.
>
> I think trivially inferred type hints might be particularly useful as we
> introduce more schema-oriented transforms
> (e.g. @with_input/output_type(beam.Row) will have very little typing
> information to check with)
>

Reply via email to