Matteo Santamaria created ARROW-17901: -----------------------------------------
Summary: `pyarrow` missing `py.typed` marker Key: ARROW-17901 URL: https://issues.apache.org/jira/browse/ARROW-17901 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Matteo Santamaria I understand that, in general, `pyarrow` does not support type hints. However, I think it is still sensible to add a `py.typed` marker file to the library. Let me demonstrate why, ``` $ pip install mypy pyarrow ``` ```python # test.py import pyarrow as pa table = pa.Table() reveal_type(table) ``` ``` $ mypy test.py test.py:1: *error:* Skipping analyzing {*}"pyarrow"{*}: module is installed, but missing library stubs or py.typed marker test.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports test.py:5: note: Revealed type is *"Any"* *Found 1 error in 1 file (checked 1 source file)* ``` Note that `mypy` identifies `table` as being an `Any` type, when obviously it is a `Table`. If we include a `py.typed` file, `mypy` will be able to make these trivial inferences. The motivating example is this, ```python @overload def from_arrow(a: pa.Table) -> DataFrame: ... @overload def from_arrow(a: pa.Array | pa.ChunkedArray) -> Series: ... def from_arrow(a: pa.Table | pa.Array | pa.ChunkedArray) -> DataFrame | Series: pass ``` The problem is that all of `pa.Table`, `pa.Array`, and `pa.ChunkedArray` are determined to be `Any`, so the overloads effectively become ```python @overload def from_arrow(a: Any) -> DataFrame: ... @overload def from_arrow(a: Any) -> Series: ... ``` and `mypy` complains that overload 2 is covered entirely by overload 1. I tried to test what adding a `py.typed` file would do, but I ran into compilation issues. I was hoping someone with a little more experience could quickly test this out for me :) -- This message was sent by Atlassian Jira (v8.20.10#820010)