Tian Gao created SPARK-54951:
--------------------------------
Summary: Make mypy pass with scipy
Key: SPARK-54951
URL: https://issues.apache.org/jira/browse/SPARK-54951
Project: Spark
Issue Type: Task
Components: PySpark
Affects Versions: 4.2.0
Reporter: Tian Gao
Currently even though mypy passes in CI, it fails locally. The reason is
because CI does not install scipy which includes stub for numpy. scipy is part
of requirements.txt so we installed it locally.
The problem mypy shows is real - CI just failed to catch it.
We can fix them all together but some of them are safer than the others. To
avoid being reverted (and making it easier to review), we will fix them in
different categorize.
* Unused "type: ignore" comment - disable it in config because this line is
sometimes used.
* Obvious wrong code
* Vague use of Iterable[float] - this is technically wrong. I think it was
written like this because we were lazy. An Iterable[float] could be infinite,
which can't be used as a vector. We should have a type alias for all types than
can be converted to a vector.
* Incompatible type due to scipy version - spmatrix is not the first citizen
anymore I think. Many methods are not defined in this base class anymore. We
can do some assertion, but this is not the safest change.
* Incompatible type due to numpy version - this is a bit tricky. We had an
effort to enforce numpy >= 2.0 but it got reverted. type hint is not super
smart about this kind of situation. We should probably explicitly ignore these
with comments that we should fix it after we drop numpy 1.x.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]