Tian Gao created SPARK-54951:
--------------------------------

             Summary: Make mypy pass with scipy
                 Key: SPARK-54951
                 URL: https://issues.apache.org/jira/browse/SPARK-54951
             Project: Spark
          Issue Type: Task
          Components: PySpark
    Affects Versions: 4.2.0
            Reporter: Tian Gao


Currently even though mypy passes in CI, it fails locally. The reason is 
because CI does not install scipy which includes stub for numpy. scipy is part 
of requirements.txt so we installed it locally.

The problem mypy shows is real - CI just failed to catch it.

We can fix them all together but some of them are safer than the others. To 
avoid being reverted (and making it easier to review), we will fix them in 
different categorize.
 * Unused "type: ignore" comment - disable it in config because this line is 
sometimes used.
 * Obvious wrong code
 * Vague use of Iterable[float] - this is technically wrong. I think it was 
written like this because we were lazy. An Iterable[float] could be infinite, 
which can't be used as a vector. We should have a type alias for all types than 
can be converted to a vector.
 * Incompatible type due to scipy version - spmatrix is not the first citizen 
anymore I think. Many methods are not defined in this base class anymore. We 
can do some assertion, but this is not the safest change.
 * Incompatible type due to numpy version - this is a bit tricky. We had an 
effort to enforce numpy >= 2.0 but it got reverted. type hint is not super 
smart about this kind of situation. We should probably explicitly ignore these 
with comments that we should fix it after we drop numpy 1.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to