[
https://issues.apache.org/jira/browse/SPARK-42647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun closed SPARK-42647.
---------------------------------
> Remove aliases from deprecated numpy data types
> -----------------------------------------------
>
> Key: SPARK-42647
> URL: https://issues.apache.org/jira/browse/SPARK-42647
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 3.3.0, 3.3.1, 3.3.3, 3.3.2, 3.4.0
> Reporter: Aimilios Tsouvelekakis
> Assignee: Aimilios Tsouvelekakis
> Priority: Minor
> Fix For: 3.3.3, 3.4.1
>
>
> Numpy has started changing the alias to some of its data-types. This means
> that users with the latest version of numpy they will face either warnings or
> errors according to the type that they are using. This affects all the users
> using numoy > 1.20.0. One of the types was fixed back in September with this
> [pull|https://github.com/apache/spark/pull/37817] request.
> The problem can be split into 2 types:
> [numpy 1.24.0|https://github.com/numpy/numpy/pull/22607]: The scalar type
> aliases ending in a 0 bit size: np.object0, np.str0, np.bytes0, np.void0,
> np.int0, np.uint0 as well as np.bool8 are now deprecated and will eventually
> be removed. At this point in numpy 1.25.0 they give a awarning
> [numpy 1.20.0|https://github.com/numpy/numpy/pull/14882]: Using the aliases
> of builtin types like np.int is deprecated and removed since numpy version
> 1.24.0
> The changes are needed so pyspark can be compatible with the latest numpy and
> avoid
> * attribute errors on data types being deprecated from version 1.20.0:
> [https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations]
> * warnings on deprecated data types from version 1.24.0:
> [https://numpy.org/devdocs/release/1.24.0-notes.html#deprecations]
>
> From my main research I see the following:
> The only changes that are functional are related with the conversion.py file.
> The rest of the changes are inside tests in the user_guide or in some
> docstrings describing specific functions. Since I am not an expert in these
> tests I wait for the reviewer and some people with more experience in the
> pyspark code.
> These types are aliases for classic python types so yes they should work with
> all the numpy versions
> [1|https://numpy.org/devdocs/release/1.20.0-notes.html],
> [2|https://stackoverflow.com/questions/74844262/how-can-i-solve-error-module-numpy-has-no-attribute-float-in-python].
> The error or warning comes from the call to the numpy.
>
> For the versions I chose to include from 3.3 and onwards but I see that 3.2
> also is still in the 18 month maintenace cadence as it was released in
> October 2021.
>
> The pull request: [https://github.com/apache/spark/pull/40220]
> Best Regards,
> Aimilios
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]