[ https://issues.apache.org/jira/browse/SPARK-42647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean R. Owen resolved SPARK-42647. ---------------------------------- Fix Version/s: 3.3.3 3.4.1 Resolution: Fixed Issue resolved by pull request 40220 [https://github.com/apache/spark/pull/40220] > Remove aliases from deprecated numpy data types > ----------------------------------------------- > > Key: SPARK-42647 > URL: https://issues.apache.org/jira/browse/SPARK-42647 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 3.3.0, 3.3.1, 3.3.3, 3.3.2, 3.4.0, 3.4.1 > Reporter: Aimilios Tsouvelekakis > Assignee: Aimilios Tsouvelekakis > Priority: Major > Fix For: 3.3.3, 3.4.1 > > > Numpy has started changing the alias to some of its data-types. This means > that users with the latest version of numpy they will face either warnings or > errors according to the type that they are using. This affects all the users > using numoy > 1.20.0. One of the types was fixed back in September with this > [pull|https://github.com/apache/spark/pull/37817] request. > The problem can be split into 2 types: > [numpy 1.24.0|https://github.com/numpy/numpy/pull/22607]: The scalar type > aliases ending in a 0 bit size: np.object0, np.str0, np.bytes0, np.void0, > np.int0, np.uint0 as well as np.bool8 are now deprecated and will eventually > be removed. At this point in numpy 1.25.0 they give a awarning > [numpy 1.20.0|https://github.com/numpy/numpy/pull/14882]: Using the aliases > of builtin types like np.int is deprecated and removed since numpy version > 1.24.0 > The changes are needed so pyspark can be compatible with the latest numpy and > avoid > * attribute errors on data types being deprecated from version 1.20.0: > [https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations] > * warnings on deprecated data types from version 1.24.0: > [https://numpy.org/devdocs/release/1.24.0-notes.html#deprecations] > > From my main research I see the following: > The only changes that are functional are related with the conversion.py file. > The rest of the changes are inside tests in the user_guide or in some > docstrings describing specific functions. Since I am not an expert in these > tests I wait for the reviewer and some people with more experience in the > pyspark code. > These types are aliases for classic python types so yes they should work with > all the numpy versions > [1|https://numpy.org/devdocs/release/1.20.0-notes.html], > [2|https://stackoverflow.com/questions/74844262/how-can-i-solve-error-module-numpy-has-no-attribute-float-in-python]. > The error or warning comes from the call to the numpy. > > For the versions I chose to include from 3.3 and onwards but I see that 3.2 > also is still in the 18 month maintenace cadence as it was released in > October 2021. > > The pull request: [https://github.com/apache/spark/pull/40220] > Best Regards, > Aimilios -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org