Hey all,
@Jack McCluskey <[email protected]>'s great talk on python type
hinting at the beam summit taught me that the "trivially" inferred types
are only used if the beam Map/FlatMap/Filter/ParDo functions aren't already
type hinted. This seems like it could be a waste of information. For
example, the following pipeline fails due to a type error:
```
def extract_x(coord: Tuple[float, float]) -> float:
"""Extract the x coordinate from a tuple."""
return coord[0]
def get_bit_length(value: int) -> int:
"""Get the bit length of an integer."""
return value.bit_length()
with beam.Pipeline() as p:
(p
| beam.Create([(1, 2)])
| beam.Map(extract_x)
| beam.Map(get_bit_length)
| beam.LogElements())
```
But if you take away the `extract_x` type hints (allowing the types to get
trivially inferred) it passes and the final pcollection actually has a
narrower type than before.
Instead of just taking type hints by priority order
(@typehints.with_output_types > function type hints > trivial_inference),
would it make sense to just take the narrowest type of the three type
sources? I suspect we'd need to have some kind of phase-in period to do
something like this but it seems worth the extra type checking security.
I think trivially inferred type hints might be particularly useful as we
introduce more schema-oriented transforms
(e.g. @with_input/output_type(beam.Row) will have very little typing
information to check with)