Hey all,

@Jack McCluskey <jrmcclus...@google.com>'s great talk on python type
hinting at the beam summit taught me that the "trivially" inferred types
are only used if the beam Map/FlatMap/Filter/ParDo functions aren't already
type hinted. This seems like it could be a waste of information. For
example, the following pipeline fails due to a type error:

```
def extract_x(coord: Tuple[float, float]) -> float:
    """Extract the x coordinate from a tuple."""
    return coord[0]

def get_bit_length(value: int) -> int:
    """Get the bit length of an integer."""
    return value.bit_length()

with beam.Pipeline() as p:
  (p
   | beam.Create([(1, 2)])
   | beam.Map(extract_x)
   | beam.Map(get_bit_length)
   | beam.LogElements())
```

But if you take away the `extract_x` type hints (allowing the types to get
trivially inferred) it passes and the final pcollection actually has a
narrower type than before.

Instead of just taking type hints by priority order
(@typehints.with_output_types > function type hints > trivial_inference),
would it make sense to just take the narrowest type of the three type
sources? I suspect we'd need to have some kind of phase-in period to do
something like this but it seems worth the extra type checking security.

I think trivially inferred type hints might be particularly useful as we
introduce more schema-oriented transforms
(e.g. @with_input/output_type(beam.Row) will have very little typing
information to check with)

Reply via email to