My two cents: I think it's valuable for explicit type hints to
override implicit ones for a couple of reasons:

- If there's a bug in inference, at least there's a way to override that.
- Explicit type hints can better capture intent. E.g. in the above
example, one might type the output to be of type float to reserve the
freedom of doing different arithmetic operations in the future (i.e.
it declares a contract that may be stricter than the current
implementation). This can also be useful when types get more complex,
e.g. saying the output is a Map[int, Sequence[int]] rather than some
monstrosity of a union based on the particular implementation.
- If one sees type hints, one can reason about that alone, rather than
having to wonder if implicit inference will narrow them.

It would be nice to get rid of the (now almost entirely redundant)
with_input/output_type that we needed in the Python 2 days, and huge
+1 to improving Row hinting.

- Robert


On Wed, Jul 16, 2025 at 8:13 AM Jack McCluskey via dev
<dev@beam.apache.org> wrote:
>
> Hey Joey,
>
> I have a few concerns around creating a more variable approach to determining 
> types. On the SDK side we'd be trying to pull hints from each route and then 
> doing some comparison across each individual component of the hint to 
> determine which is the most narrow. For the simplest case, where the inferred 
> hints have direct inherited relationships, you're taking the more specific 
> one. But once you get into types that aren't directly related or something 
> like a composite type, how are you determining which hint is narrower? It's 
> not too hard to imagine a scenario where nested types within two composites 
> are of varying scope, which greatly complicates the evaluation. dict[Any, 
> int] and dict[int, Any] are compatible hints, and I couldn't tell you which 
> is the narrower hint. We could do some level of comparison at each field and 
> combine the hint into dict[int,int]; however, I'm not particularly keen on 
> the idea of overriding user-provided type hints with our own generated ones.
>
> Which brings me to the user side of not necessarily knowing which type hints 
> are actually being applied to each transform in their pipeline. If everything 
> works that's great, but once type check errors start being thrown and the 
> user cannot directly map those hints to what they provided (particularly by 
> expanding the use of trivial inference, which is fully invisible to the user) 
> there will be a lot more frustration. Having a consistent hierarchy of how 
> hints get applied makes much more sense (although the function decorator 
> approach currently overriding other methods is not super clear or ideal IMO.) 
> I'm still an advocate for PEP 484 style hints being the best way to annotate 
> Python code since it gives the static checks at code writing time in addition 
> to our pipeline construction checks and is the most Pythonic approach. I'm 
> also not a huge fan of expanding dependency on the trivial inference module, 
> since that code depends on CPython internals that get updated every minor 
> version release now.
>
> For the code example given you've actually changed the definition of 
> extract_x by removing the hints and going to trivial inference. Ints are 
> floats but floats are not ints, so the type error here is between extract_x 
> and get_bit_length. As written, that failure is valid.
>
> I do agree that Rows and Schemas need more robust type checking, there are 
> some issues checking between the two (particularly at xlang transform 
> boundaries.) That's a pretty clear gap in the code base right now that would 
> be great to close.
>
> Thanks,
>
> Jack McCluskey
>
> On Tue, Jul 15, 2025 at 7:44 PM Joey Tran <joey.t...@schrodinger.com> wrote:
>>
>> Hey all,
>>
>> @Jack McCluskey's great talk on python type hinting at the beam summit 
>> taught me that the "trivially" inferred types are only used if the beam 
>> Map/FlatMap/Filter/ParDo functions aren't already type hinted. This seems 
>> like it could be a waste of information. For example, the following pipeline 
>> fails due to a type error:
>>
>> ```
>> def extract_x(coord: Tuple[float, float]) -> float:
>>     """Extract the x coordinate from a tuple."""
>>     return coord[0]
>>
>> def get_bit_length(value: int) -> int:
>>     """Get the bit length of an integer."""
>>     return value.bit_length()
>>
>> with beam.Pipeline() as p:
>>   (p
>>    | beam.Create([(1, 2)])
>>    | beam.Map(extract_x)
>>    | beam.Map(get_bit_length)
>>    | beam.LogElements())
>> ```
>>
>> But if you take away the `extract_x` type hints (allowing the types to get 
>> trivially inferred) it passes and the final pcollection actually has a 
>> narrower type than before.
>>
>> Instead of just taking type hints by priority order 
>> (@typehints.with_output_types > function type hints > trivial_inference), 
>> would it make sense to just take the narrowest type of the three type 
>> sources? I suspect we'd need to have some kind of phase-in period to do 
>> something like this but it seems worth the extra type checking security.
>>
>> I think trivially inferred type hints might be particularly useful as we 
>> introduce more schema-oriented transforms (e.g. 
>> @with_input/output_type(beam.Row) will have very little typing information 
>> to check with)

Reply via email to